A mathematical science in which the probabilities (cf.
Probability)
of certain random events are used to deduce the probabilities
of other random events which are connected with the former events in some manner.
A statement to the effect that the probability of occurrence of
a certain event is, say, 1/2, is not in itself valuable, since one is interested in
reliable knowledge.
Only results which state that the probability of occurrence of a certain event
is quite near to one or (which is the same thing)
that the probability of the event not occurring is
very small, represent ultimately valuable information. In accordance
with the principle of
"discarding sufficiently small probabilities" ,
such an event is considered to be
practically certain.
It will be shown below (cf. the section: Limit theorems)
that conclusions of scientific and practical interest are usually
based on the assumption that the occurrence or non-occurrence of an event
depends on a large number of random factors, which
are interconnected only to a minor extent (cf.
Law of large numbers
in connection with this subject). It may also be said, accordingly,
that probability theory is the mathematical science of the laws
governing the interaction of a large number of random factors.
The subject matter of probability theory.
In order to describe a regular connection between certain conditions
and an event
,
the occurrence or non-occurrence of which can
be established exactly, one of the following two schemes are usually employed in science.
1)
The occurrence of event
follows each realization of the conditions
.
This is the form of, say, all the laws
of classical mechanics which state that under given initial conditions and
forces acting on a body or a system of bodies,
the motion will proceed in a uniquely determined manner.
2)
Under the conditions
the occurrence of event
has a definite probability
which is equal to
.
For instance, the laws governing ionizing radiation say that,
for each radioactive substance there is a definite probability
that, in a given period of time, some number
of the atoms of the substance will decay.
The
frequency of occurrence
of event
in a given sequence of
trials (i.e.
repeated realizations of the conditions
)
is the ratio
between the number
of trials in which
has occurred to the total number of trials
.
That there is in fact a definite probability
for
to occur, under the conditions
,
is manifested by the fact that in almost-all sufficiently
large sequences of trials the frequency of occurrence of
is approximately equal to
.
Any mathematical model which is intended to be
a schematic description of the connection between conditions
and a random event
,
usually also contains certain assumptions about the nature and
the degree of dependence of the trials. After these
additional assumptions (of which the most frequent one is mutual
independence
of the trials; see the section: Fundamental concepts in probability theory)
have been made, it is possible to give a quantitative,
more precise expression of the somewhat vague statement made above to
the effect that the frequency is close to the probability.
Statistical relationships,
i.e. relationships which may be described by a scheme
of type 2) above, were first noted for games
of chance such as throwing a die. Statistical relationships concerning
births and deaths have been known for a very long time
(e.g. the probability of a newborn (human) baby being a boy
is 0.515). The end of the
19th century
and the first half
of the
20th century
have witnessed the discovery of a
large number of statistical laws in physics, chemistry, biology, and
other sciences. It should be noted that statistical laws are
also involved in schemes not directly related to the concept
of randomness, e.g. in the distribution of digits in tables of functions, etc. (cf.
Random and pseudo-random numbers).
This fact is utilized, in particular, in the
"simulation"
of random phenomena (see
Statistical experiments, method of).
That methods of probability theory can be used in studying
the relationships prevailing in a large number of sciences apparently
unrelated to each other is due to the fact
that probabilities of occurrence of events invariably satisfy certain simple
laws, which will be discussed below (cf. the section: Fundamental concepts
in probability theory). The study of the properties of the probability
of occurrence of events, based on these simple
laws, forms the subject matter of probability theory.
Fundamental concepts in probability theory.
The fundamental concepts in probability theory, as
a mathematical discipline, are most simply exemplified within the framework of so-called
elementary probability theory.
Each trial
considered in elementary probability theory is such that it yields
one and only one outcome or, as it is called, one of the
elementary events
,
which are supposed to be finite in number. To each outcome
a non-negative number
is connected — the
probability
of this outcome. The sum of the numbers
must be one. Consider events
which are characterized by the condition
"either wi or wj… or wk occurs."
The outcomes
are said to be
favourable
to
and, by definition, one says that the probability
of
is equal to the sum of the probabilities of the outcomes favourable to this event:
If there are

outcomes favourable to

,
then the special case

yields the formula
Formula
(2)
expresses the so-called
classical concept of probability,
according to which the probability of some event

is equal to the ratio between the number

of outcomes favourable to

and the number

of all
"equally probable"
outcomes. The computation of
probabilities is thus reduced to counting the number of outcomes favourable to

and often proves to be a difficult problem in combinatorics.
Example.
Each one of the 36 possible outcomes of throwing a pair of dice may be denoted by
,
where
is the number of dots shown by the first die, while
is the number of dots shown by the second. Event
—
"the sum of the dots is 4"
— is favoured by three
outcomes:
,
,
.
Thus,
.
The problem of determining the numerical values of the probabilities
in a given specific problem lies, strictly speaking, outside the scope of
probability theory as a discipline of pure mathematics. In some cases
these values are established as a result of processing the
results of a large number of observations. In other
cases it is possible to predict the probabilities of encountering given
events in a given trial theoretically. Such a prediction is frequently
based on an objective symmetry of the connections between the conditions
under which the trial is conducted and the outcomes of the trials, and
in such cases leads to a formula like
(2).
Let, for instance, the trial
consist in throwing a die in the form of a cube made of a
homogeneous material. One may then assume that each side of the die has
a probability of 1/6 of
"coming out" .
In this case the
assumption that all outcomes are equally probable is confirmed by experiment.
Examples of this kind in fact form the
basis of the classical definition of a probability.
A more detailed and thorough explanation for the causes
of equal probabilities of individual outcomes in some
special cases may be given by the so-called
method of arbitrary functions.
The method is explained below by taking again dice throwing
as an example. Let the conditions of the trials be such
that accidental effects of air on the die are negligible. In such
a case, if the initial position, the initial velocity and the
mechanical properties of the die are known exactly, the motion of
the die may be calculated by the methods of classical mechanics, and
the result of the trial may be reliably predicted. In practice,
the initial conditions can never be determined with absolute accuracy and
even very small changes in the initial velocity will
produce a different result, provided the period of time
between the throw and the fall of the die is sufficiently
long. It has been found that, under very general assumptions
with respect to the probability distribution of the initial values (hence
the name of the method), the probability of each one
of the six possible outcomes tends to 1/6 as
.
A second
consists of the shuffling of a pack of cards in
order to ensure that all possible distributions are equally
probable. Here, the transition from one distribution of the cards
to the next as a result of two successive shuffles
is usually random. The tendency to equi-probability is established
by methods of the theory of Markov chains (cf.
Markov chain).
Both these cases can be seen as part of general
ergodic theory.
Given a certain number of events, two new events may
be defined: their union (sum) and combination (product, intersection). The event
:
"at least one of A1…Ar occurs" ,
is said to be the
union
of events
.
The event
:
"A1… and Ar occur" ,
is said to be the
combination
or
intersection
of events
.
The symbols for union and intersection of events are
and
,
respectively. Thus:
Two events

and

are said to be
mutually exclusive
if their joint occurrence is impossible, i.e. if none
of the possible results of a trial favours both

and

.
If the events

are identified with the sets of their favourable outcomes, events

and

will be identical with the union and the intersection of the respective sets.
Two fundamental theorems in probability theory — theorems on
addition and multiplication of probabilities — are
connected with the operations just introduced.
The
theorem on addition of probabilities.
If the events
are such that any two of them are mutually exclusive, the probability
of their union is equal to the sum of their probabilities.
Thus, in the example mentioned above — throwing a pair of dice,
"the sum of the dots is 4 or less"
is the sum of the three mutually exclusive events
in which the sum of the dots is 2, 3 and 4, respectively.
The probabilities of these events are 1/36, 2/36 and
3/36, respectively; in accordance with the addition theorem,
is equal to
The
conditional probability
of event

occurring if condition

is met is defined by the formula
which may be shown to be in complete agreement
with the properties of the frequencies of occurrence. Events

are said to be
independent
if the conditional probability of any one of the events occurring under the
condition that some of the other events have also
occurred is equal to its
"unconditional"
probability (see also
Independence
in probability theory).
The
theorem on multiplication of probabilities.
The probability of joint occurrence of events
is equal to the probability of occurrence of event
multiplied by the probability of occurrence of event
on the condition that
has in fact occurred
multiplied by the probability of occurrence of event
on the condition that the events
have in fact occurred. If the events are
independent, the multiplication theorem yields the formula
i.e. the probability of joint occurrence of independent events is equal
to the product of the probabilities of these events. Formula
(3)
remains valid
if some of the events are replaced in both its parts by the complementary events.
Example.
Four shots are fired at a target, the probability of hitting the target being
0.2 with each shot. The hits scored in different shots are considered
to be independent events. What will be the
probability of hitting the target exactly three times?
Each outcome of a trial can be symbolized by a sequence of four letters (e.g.
means that the first and fourth shots were hits, while the second and
the third shots were misses). The total number of outcomes will be
.
Since the results of individual shots are assumed
to be independent, the probability of the outcomes must be determined with the aid of
formula
(3)
including the comment which accompanies it. Thus, the probability of the outcome
will be
where

is the probability of miss in a single shot.
The outcomes favouring the event
"the target is hit three times"
are

,

,

,
and

.
The probabilities of all four outcomes are equal:
so that the probability of the event is
A generalization of the above reasoning leads to one of
the fundamental formulas in probability theory: If the events
are independent and if the probability of each individual event occurring is
,
then the probability of occurrence of exactly
such events is
where

denotes the number of combinations of

elements out of

elements (see
Binomial distribution).
If

is large, computations according to formula
(4)
become laborious. In the above example, let the number of
shots be 100; one has to find the probability

of the number of hits being between 8 and 32. The use
of formula
(4)
and of the addition theorem yields an
accurate, but unwieldy expression for the probability value sought, namely:
An approximate value of the probability

may be found by the use of the
Laplace theorem:
the error not exceeding 0.0009. This result shows that the occurrence of the event

is practically certain. This is a very simple, but typical, example of the use of
limit theorems
in probability theory.
Another fundamental formula in elementary probability theory is the so-called
formula of total probability:
If events
are pairwise mutually exclusive and if their union is
the sure event, the probability of any single event
is equal to the sum
The theorem on multiplication of probabilities is particularly useful when
compound trials
are considered. One says that a trial
is composed of trials
if each outcome of
is a combination of certain outcomes
of the respective trials
.
Frequently one is in the situation where the probabilities
are, for some reason, known. The data in
(5)
together with
the multiplication theorem may then be used to determine the probabilities

for all outcomes

of the compound trial, as well as the
probabilities of all events connected with this trial (as was
done in the example discussed above). Two types of
compound trials are especially important in practice: A) the individual trials are
independent,
i.e. the probabilities in
(5)
are equal to the unconditional probabilities

;
B) the probabilities of the outcomes of a given trial
are only affected by the outcomes of the immediately
preceding trial, i.e. the probabilities in
(5)
are equal, respectively, to

.
One then says that the trials are
connected in a Markov chain.
The probabilities of all events connected with a compound
trial are here fully determined by the initial probabilities

and by the intermediate probabilities

(cf.
Markov process).
Random variables.
If each outcome of a trial
is put into correspondence with a number
,
one says that a
random variable
has been specified. Among the numbers
there may be equals; the set of
different
values of
,
where
,
is the set of
possible values
of the random variable. The set of possible values of a random
variable, together with their respective probabilities is said to be the
probability distribution
of the random variable. Thus, in the example of throwing a pair of dice, to each outcome
of the trial there corresponds the value of the random variable
which is the sum of the dots on the two dice. The possible values are
and their respective probabilities are
.
In a joint study of several random variables one introduces the concept of their
joint distribution,
which is defined by indicating the possible values of each
one, and the probabilities of joint occurrence of the events
where

is one of the possible values of the variable

.
Random variables are said to be
independent
if the events in
(6)
are independent whatever the choice of the

.
The joint distribution of random variables can be used to
calculate the probability of any event defined by these variables, e.g. of the event
etc.
Often, instead of giving the distribution of a random variable
completely, one uses a, not too large, collection of
numerical characteristics. The ones most often used are the
mathematical expectation
and the
dispersion
(variance). (See also
Moment;
Semi-invariant.)
The fundamental characteristics of a joint distribution of several random
variables include — in addition to the mathematical expectations and
the variances of these variables — also the correlation coefficients (cf.
Correlation coefficient),
etc. The meaning of these characteristics can be made
clear, to a considerable extent, by limit theorems (see the section: Limit theorems).
The scheme of trials with a finite number of outcomes
proves inadequate even in the simplest applications of probability theory.
Thus, in the study of the random dispersion of the
hitting sites of projectiles around the centre of a target, or
in the study of random errors in the determination of some value,
etc., it is not possible to limit the model to
trials with a finite number of outcomes. Moreover, such outcomes may, in
some cases, be expressed by a number or a set of numbers,
while in other cases the outcome of a trial may be
a function (e.g. a record of the variation of atmospheric pressure
at a given location over a certain period of time), a
set of functions, etc. It should be noted
that many definitions and theorems given above, after suitable
modifications, are also applicable in these more general cases,
although the forms in which the probability distribution is presented are different (cf.
Density of a probability distribution;
Probability distribution).
Here, the classical
"equal probability of each outcome"
is replaced by a
uniform distribution
of the objects under consideration in some area (this is exactly what
is meant when speaking of a point randomly selected in a
given area, a randomly selected tangent to some figure, etc.).
Major changes are introduced in the definition of a probability which, in
the elementary case, is given by formula
(2).
In
the more general schemes now discussed, the events are the union
of an infinite number of elementary events the probability of each one
of which may be zero. Thus, the property which is described
by the addition theorem is not a consequence of
the definition of probability, but is part of it.
The logical scheme of constructing the fundamentals of
probability theory which is most often employed was developed in
1933
by
A.N. Kolmogorov.
The fundamental characteristics of
this scheme are the following. In studying a real
problem by the methods of probability theory, the first step is to isolate a set
of elements
,
called
elementary events.
Any event can be fully described by the set of elementary events
favourable to it, and is therefore considered as
some set of elementary events. To some events
are assigned certain numbers
,
which are called their probabilities and which satisfy the following conditions:
1)
;
2)
;
3)
if the events
are pairwise mutually exclusive, and if
is their union, then
(additivity of probabilities).
In order to construct a mathematically rigorous theory, the domain of definition of
must be a
-algebra,
and condition
(3)
must also be met for an
infinite
sequence of events which are mutually
exclusive (countable additivity of probabilities). Non-negativity and
countable additivity are fundamental properties of measures. Thus, probability
theory may be formally regarded as a part of
measure
theory. The fundamental concepts of probability theory are then
viewed in a new light: random variables become
measurable functions, their mathematical expectations become the abstract integrals
of Lebesgue, etc. However, the main problems of probability theory
and of measure theory are different. In probability theory, the
basic,
specific
concept is that of
independence
of events, trials and random variables. Moreover, probability
theory comprises a thorough study of
subjects such as probability distributions, conditional mathematical expectations, etc.
The following comments may be made on the scheme
described above. In accordance with the scheme, each probability model is based on a
probability space,
which is a triplet
,
where
is a set of elementary events,
is a
-algebra
of subsets of
and
is a probability distribution (a countably-additive normalized measure) on
.
Two achievements of this scheme are the
definition of probabilities in infinite-dimensional spaces (in particular,
in spaces connected with infinite sequences of trials
and stochastic processes), and the general definition of
conditional probabilities
and conditional mathematical expectations (with respect
to a given random variable, etc.).
Subsequent development of probability theory showed that the above
definition of a probability space can be expediently narrowed.
These developments have led to concepts such as
perfect distributions
and probability spaces,
Blackwell spaces,
Radon probability measures
on topological (linear) spaces, etc. (see
Probability distribution).
There are also other approaches to the fundamental concepts
of probability theory, such as axiomatization, the principal
object of which is a normalized Boolean algebra of
events. Here, the principal advantage (provided that the algebra
being considered is complete in the metric sense) consists of
the fact that for any directed system of events the following relations are true:
It is possible to axiomatize the concept of a random variable as
an element of some commutative algebra with a positive linear functional
defined on it (the analogue of the mathematical expectation). This
is the starting point for non-commutative and quantum probability.
Limit theorems.
In a formal exposition of probability theory limit theorems
appear as a kind of superstructure over its elementary sections
in which all problems are of a finite, purely arithmetical
nature. However, the cognitive value of probability theory can only be
revealed by these limit theorems. Thus, it is shown by the
Bernoulli theorem
that the frequency of occurrence of a given event in
independent trials is usually close to its probability, while the
Laplace theorem
yields the probabilities of deviations of this frequency from its limiting value.
In a similar manner, the meaning of the characteristics of
a random variable such as its mathematical expectation and variance are explained by the
law of large numbers
and the
central limit theorem
(see also
Limit theorems
in probability theory).
Let
be independent random variables with the same probability distribution, with

,

,
and let

be the
arithmetical average
of the first

variables of the sequence
(7):
In accordance with the law of large numbers, for any

the probability of the inequality

tends to one as

,
so that, as a rule, the value of

is close to

.
This result is rendered more precise by the central
limit theorem, according to which the deviations of

from

are approximately normally distributed, with mathematical expectation 0 and variance

.
Thus, in order to calculate (to a first
approximation) the probability of some deviation of

from

for large

,
there is no need to know the distribution of the variables

in all details; knowledge of their variance is
sufficient. If a higher accuracy of approximation is required,
moments of higher order must also be used.
The above statements, with suitable modifications, may be extended
to random vectors (in finite-dimensional and in
some infinite-dimensional spaces). The independence conditions may be replaced
by conditions of a
"weak"
(in some sense) dependence of the
.
Limit theorems of distributions on groups, of distributions of
values of arithmetic functions, etc., are also known.
In applications — in particular, in mathematical statistics and
statistical physics — it may be necessary
to approximate small probabilities (i.e. probabilities of events of the type
)
with a high
relative accuracy.
This involves major corrections to the normal approximation (cf.
Probability of large deviations).
It was noted in the nineteen twenties that quite
natural non-normal limit distributions may appear even in schemes of
sequences of uniformly-distributed and independent random variables. For instance, let
be the time which elapses until some randomly
varying variable has returned to its initial location, let
be the time between the first and the second such returns,
etc. Then, under very general conditions, the distribution of the sum
(i.e. the time elapsing prior to the
-th
return) will, after multiplication by
(where
is a constant smaller than one), converge to some
limit distribution. Thus, the time prior to the
-th
return increases, roughly speaking, in proportion to
,
i.e. at a faster rate than
(if the law of large numbers were applicable, it would be of order
).
This is seen in the case of a
Bernoulli random walk
(in which another paradoxical law — the
arcsine law
— also appears).
The principal method of proof of limit theorems
is the method of characteristic functions (cf.
Characteristic function)
(and the related methods of Laplace transforms and of generating functions).
In a number of cases it becomes necessary to
invoke the theory of functions of a complex variable.
The mechanism of the existence of most limit relationships can
be completely understood only in the context of the theory of stochastic processes.
Stochastic processes.
During the past few decades the need to consider stochastic processes (cf.
Stochastic process)
— i.e. processes with a given probability
of their proceeding in a certain manner, arose in
certain physical and chemical investigations, along with the study of one-dimensional
and higher-dimensional random variables. The coordinate of a particle executing a
Brownian motion
may serve as an example of a stochastic process.
In probability theory a stochastic process is usually
regarded as a one-parameter family of random variables
.
In most applications the parameter
is time, but it may also be an arbitrary variable, and
in such cases it is usual to speak of a
random function
(if
is a point in space — a
random field).
If the parameter
runs through integer values, the random function is said to be a
random sequence
(or a
time series).
While a random variable may be characterized by a distribution law, a
stochastic process may be characterized by the totality of joint distribution laws for
for all possible moments of time
for any
(the so-called
finite-dimensional distributions).
The most interesting concrete results in the theory of stochastic
processes were obtained in two fields —
Markov processes and stationary stochastic processes (cf.
Markov process;
Stationary stochastic process);
the interest in martingales (cf.
Martingale)
is now also strongly increasing.
Chronologically, Markov processes were the first to be studied. A stochastic process
is said to be a
Markov process
if, for any two moments of time
and
(
),
the conditional probability distribution of
depends, provided all values of
for
are given, only on
.
For this reason Markov processes are sometimes referred to as
processes without after-effect.
Markov processes are a natural extension of the deterministic
processes studied in classical physics. In deterministic processes the
state of the system at the moment of time
uniquely determines the course of the process in the future; in Markov
processes the state of the system at the moment of time
uniquely determines the probability distribution of the course of the process at
,
and this distribution cannot be altered by any information on the
course of the process prior to the moment of time
.
Just as the study of continuous deterministic processes
is reduced to differential equations involving functions which describe
the state of the system, the study of continuous Markov processes
can, to a large extent, be reduced to differential or differential-integral
equations with respect to the distribution of the probabilities of the process.
Another major subject in the field of stochastic processes
is the theory of stationary stochastic processes. The stationary nature
of a process, i.e. the fact that its probability relations
remain unchanged with time, imposes major restrictions on the process
and makes it possible to arrive at several important deductions based on this premise.
A major part of the theory is based only on the
assumption of stationarity in a wide sense, viz. that the mathematical expectations
and
are independent of
.
This assumption leads to the so-called spectral decomposition:
where

is a random function with uncorrelated increments. Methods
of best (in the mean square) linear interpolation,
extrapolation and filtering have been developed for stationary processes.
Recently a rather large class of processes, the so-called semi-martingales,
which serves to solve problems of optimal non-linear
filtering, interpolation and extrapolation, has been isolated (cf.
Stochastic processes, prediction of;
Stochastic processes, filtering of;
Stochastic processes, interpolation of).
A substantial part of the relevant analytical apparatus is provided
by stochastic differential equations, stochastic integrals and
martingales. A distinguishing feature of a
martingale
is the fact that the conditional mathematical expectation of
is
,
given the values of
for
,
.
The theory of stochastic processes is closely connected with the classical
problems on limit theorems for sums of random
variables. Distributions which appear as limit distributions in the
study of sums of random variables become exact distributions
of appropriate characteristics in the theory of stochastic processes.
This fact makes it possible to demonstrate many limit
theorems with the aid of these associated stochastic processes.
One may finally note that the logically unobjectionable
definition of the concepts connected with stochastic processes
within the framework of the axiomatics discussed above has
always presented and still presents a large number
of difficulties of measure-theoretic nature. These are connected,
for example, with the definition of probabilistic
continuity, differentiability, etc., of stochastic processes (cf.
Separable process).
This is why monographs on the theory of stochastic processes devote about
half their space to the analysis of the development of measure-theoretic constructions.
See also the references to entries on individual subjects of probability theory.