A general theory for the processing and use of statistical
observations. In a broader interpretation of the term, statistical decision
theory is the theory of choosing an
optimal non-deterministic behaviour in incompletely known situations.
Inverse problems of probability theory are a subject
of mathematical statistics. Suppose that a random phenomenon
occurs, described qualitatively by the measure space
of all its elementary events
and quantitatively by a probability distribution
of the events. The statistician knows only the qualitative description of
,
and has only incomplete information on
of the type
,
where
is a family of probability distributions. By making one or more observations of
and processing the data thus obtained, the statistician has to make a decision on
and choose the most profitable way to proceed (in particular, it may be
decided that insufficient material has been collected and that the set of
observations has to be extended before final inferences be made). In
classical problems of mathematical statistics, the number
of independent observations (the size of the sample) was
fixed and optimal estimators of the unknown distribution
were sought. The general modern conception of a
statistical decision is attributed to
A. Wald
(see
[2]).
It is assumed that every experiment has a cost which has to
be paid for, and the statistician must meet the loss of a
wrong decision by paying the
"fine"
corresponding to his error. Therefore,
from the statistician's point of view, a decision rule (procedure)
is optimal when it minimizes the risk
— the
mathematical expectation of his total loss. This approach was
proposed by Wald as the basis of statistical
sequential analysis
and led to the creation in
statistical quality control
of procedures which, with the same accuracy of inference, use on the average
almost half the number of observations as the classical decision rule. In
the formulation described, any statistical decision problem can be seen
as a two-player game in the sense of
J. von Neumann,
in which
the statistician is one of the players and nature is the other (see
[3]).
However, as early as
1820,
P. Laplace
had likewise
described a statistical estimation problem as a game of chance in
which the statistician is defeated if his estimates are bad.
The value of the risk
depends both on the decision rule
and on the probability distribution
that governs the distribution of the results of
the observed phenomenon. As this
"true"
value of
is unknown, the entire
risk function
has to be minimized with respect to
as a function in
for a given
.
A decision rule
is said to be
uniformly better
than
if
for all
and
for at least one
.
A decision rule
is said to be
admissible
if no uniformly-better decision rules exist. A class
of decision rules is said to be
complete
(essentially complete)
if for any decision rule
there is a uniformly-better (not worse) decision rule
.
The most important is a
minimal
complete class of decision rules which coincides (when it exists) with the
set of all admissible decision rules. If the minimal complete class contains
precisely one decision rule, then it will be optimal. Generally,
the risk functions corresponding to admissible decision rules must also be
compared by the value of some other functional, for
example, the maximum risk. The optimal decision rule
in this sense,
 |
is called the
minimax rule.
Comparison using the Bayesian risk is also possible:
— averaging the risk over an a priori probability distribution

on the family

.
This choice of functional is natural, especially when sets
of experiments are repeated with a fixed marginal distribution

in the

-th
set, whereas the

prove to be a random series of measures with unknown distribution

(see
Bayesian approach).
The optimal decision rule in this sense,
is called the
Bayesian decision rule
with
a priori distribution

.
Finally, an a priori distribution

is said to be
least favourable
(for the given problem) if
Under very general assumptions it has been proved that: 1) for any a priori distribution

,
a Bayesian decision rule exists; 2) the totality of all Bayes
decision rules and their limits forms a complete class; and 3)
minimax decision rules exist and are Bayesian rules
relative to the least-favourable a priori distribution, and

(see
[4]).
The concrete form of optimal decision rules essentially depends
on the type of statistical problem. However, in
classical problems of statistical estimation, the optimal decision rule when
the samples are large depends weakly on the chosen method of comparing risk functions.
Decision rules in problems of statistical decision theory can
be deterministic or randomized. Deterministic rules are defined by functions,
for example by a measurable mapping of the space
of all samples
of size
onto a measurable space
of decisions
.
Randomized rules are defined by Markov transition probability distributions of the form
from
into
,
which describe the probability distribution according to which the selected value
must also be independently
"chosen"
(see
Statistical experiments, method of;
Monte-Carlo method).
The allowance of randomized procedures makes the set of
decision rules of the problem convex, which greatly facilitates
theoretical analysis. Moreover, problems exist in which the optimal decision
rule is randomized. Even so, statisticians try to avoid them whenever possible
in practice, since the use of tables or other sources
of random numbers for
"determining"
inferences complicates
the work and even may seem unscientific.
A statistical decision rule is by definition a
transition probability distribution from a certain measurable space
of results of the experiment into a measurable space
of decisions. Conversely, every transition probability distribution
can be interpreted as a decision rule in
any statistical decision problem with a measurable space
of results and a measurable space
of inferences (it can also be interpreted as
a memoryless communication channel with input alphabet
and output alphabet
).
The statistical decision rules form an algebraic category with objects
— the
totality of all probability distributions on measurable spaces
,
and morphisms — transition probability distributions of
.
The invariants and equivariants of this category define many
natural concepts and laws of mathematical statistics (see
[5]).
For example, an invariant Riemannian metric, unique up to a factor, exists
on the objects of this category. It is defined by the Fisher
information matrix.
The morphisms of the category generate equivalence and order relations
for parametrized families of probability distributions and for
statistical decision problems, which permits one to give a natural definition of a
sufficient statistic.
The Kullback non-symmetrical information deviation
,
which characterizes the dissimilarity of the probability distributions
and
(see
Information distance),
is a monotone invariant in the category:
 |
if

,
i.e. if

and

for a certain

.
If in the problem of statistical estimation by a sample of fixed size

there is a need to estimate the actual marginal probability distribution

of the results of observations, which belongs a priori to a smooth family

,
then, given the choice

for an invariant loss function for the decision

,
the
minimax risk
proved to be
The logic of quantum events is not Aristotelean; random phenomena of
the micro-physics are therefore not a subject of classical probability theory.
The formalism designed to describe them accepts the existence of
non-commuting random variables and contains the classical theory as
a degenerate commutative scheme. In the corresponding interpretation, many
problems of the theory of quantum-mechanical measurements become
non-commutative analogues of problems of statistical decision theory (see
[6]).