Suppose we have a fixed iid data sample
.
We have two choices:
or
.
That is, the data is generated by either
or
. Call
the ``null'' hypothesis
and
the ``alternative''. The alternative hypothesis indicates a disturbance is present. If we decide
, we signal an
``alarm'' for the disturbance.
We process the data by a decision function
We have two possible errors:
False Alarm:
, but
Miss:
, but
In the non-Bayesian setting, we wish to choose a family of ,
which navigate the optimal tradeoff between the probabilities of miss and false alarm.
The probability of miss, , is
and the probability of false alarm, , is
.
We optimize the tradeoff by comparing the likelihood ratio to a
nonnegative threshold, say
:
Equivalently, compare the log likelihood ratio to an arbitrary
real threshold :
Increasing makes the test less ``sensitive'' for the disturbance:
we accept a higher probability of miss in return for a lower probability
of false alarm. Because of the tradeoff, there is a limit as to
how well we can do, which improves exponentially as we collect more
data. This limit relation is given by Stein's lemma.
Fix
. Then, as
, and for
large , we get:
The quantity
is the
Kullback-Leibler distance, or the expected value of the
log likelihood ratio. We define, where and are densities:
The following facts about Kullback-Leibler distance hold:
. Equality holds when
except on
a set of -measure zero. I.E. for a continuous sample
space you can allow difference on sets of Lebesgue measures zero,
for a discrete space you cannot allow any difference.
, in general. So the K-L distance is not
a metric. The triangle inequality also fails.
When belong to the same parametric family, we adopt the
shorthand:
rather than
. Then we have an additional
fact. When hypotheses are ``close'', K-L distance behaves approximately like
the square of the Euclidean metric in parameter ()-space.
Specifically:
where
is the Fisher information. The right hand side
is sometimes called the square of the Mahalanobis distance.
Furthermore, we may assume the hypotheses are ``close'' enough
that
. Then, K-L information
appears also symmetric.
Practically there is still the problem to choose , or to
choose ``desirable'' probabilities of miss and false alarm which
obey Stein's lemma, which gives also the data size. We can solve
for given the error probabilities. However, it is often
``unnatural'' to specify these probabilities; instead, we are concerned
about other, observable effects on the system. Hence, the usual scenario
results in a lot of lost sleep, as we are continually varying , running
simulations, and then observing some distant outcome.
Fortunately, the Bayesian approach comes to the rescue. Instead
of optimizing a probability tradeoff, we assign costs:
to a miss event and
to a false alarm event.
Additionally, we have a prior distribution on
Let
be the decision function as before.
The Bayes risk, or expected cost, is as follows.
It follows, the optimum-Bayes risk decision also involves
comparing the likelihood ratio to a threshold:
We see the threshold is available in closed form, as a function of costs
and priors.