Bayesian Hypothesis Testing

Bayesian Hypothesis Testing

Suppose we have a fixed iid data sample $y_{1:N} \sim f_{\theta}(y)$ . We have two choices: $\theta = \theta_{0}$ or $\theta = \theta_{1}$ . That is, the data $y_{1:N}$ is generated by either $\theta_{0}$ or $\theta_{1}$ . Call $\theta_{0}$ the ``null'' hypothesis and $\theta_{1}$ the ``alternative''. The alternative hypothesis indicates a disturbance is present. If we decide $\theta = \theta_{1}$ , we signal an ``alarm'' for the disturbance.

We process the data by a decision function $g(y_{1:N})$

$\displaystyle D(y_{1:N})$	$\displaystyle =$	$\displaystyle 0, \theta = \theta_{0}$
	$\displaystyle =$	$\displaystyle 1, \theta = \theta_{1}$

We have two possible errors:

False Alarm: $\theta = \theta_{0}$ , but $D(y_{1:N}) = 1$
Miss: $\theta = \theta_{1}$ , but $D(y_{1:N}) = 0$

In the non-Bayesian setting, we wish to choose a family of $D(\cdot)$ , which navigate the optimal tradeoff between the probabilities of miss and false alarm.

The probability of miss, $P_{M}$ , is $P(D(y_{1:N})) = 0 \vert \theta = \theta_{1})$ and the probability of false alarm, $P_{FA}$ , is $P(D(y_{1:N})) = 1 \vert \theta = \theta_{0})$ .

We optimize the tradeoff by comparing the likelihood ratio to a nonnegative threshold, say $\exp(T) > 0$ :

$\displaystyle D_{*}(y_{1:N})$

$\displaystyle =$

$\displaystyle 1_{\frac{f_{\theta_{1}}(y)}{f_{\theta_{0}}(y)} > \exp(T)}$

Equivalently, compare the log likelihood ratio to an arbitrary real threshold

$\displaystyle D_{*}(y_{1:N})$

$\displaystyle =$

$\displaystyle 1_{ \log \frac{f_{\theta_{1}}(y)}{f_{\theta_{0}}(y)} > T}$

Increasing

makes the test less ``sensitive'' for the disturbance: we accept a higher probability of miss in return for a lower probability of false alarm. Because of the tradeoff, there is a limit as to how well we can do, which improves exponentially as we collect more data. This limit relation is given by Stein's lemma. Fix $P_{M} = \epsilon$ . Then, as $\epsilon \rightarrow 0$ , and for large

, we get:

$\displaystyle \frac{1}{N} \log P_{FA}$

$\displaystyle \rightarrow$

$\displaystyle -K(f_{\theta_{0}}, f_{\theta_{1}})$

The quantity $K(f_{\theta_{0}}, f_{\theta_{1}})$ is the Kullback-Leibler distance, or the expected value of the log likelihood ratio. We define, where

and

are densities:

$\displaystyle K(f,g)$

$\displaystyle =$

$\displaystyle E_{f}\left[\log(g/f)\right]$

The following facts about Kullback-Leibler distance hold:

$K(f,g) \ge 0$ . Equality holds when $f \equiv g$ except on a set of -measure zero. I.E. for a continuous sample space you can allow difference on sets of Lebesgue measures zero, for a discrete space you cannot allow any difference.
$K(f,g) \neq K(g,f)$ , in general. So the K-L distance is not a metric. The triangle inequality also fails.

When

belong to the same parametric family, we adopt the shorthand: $K(\theta_{0}, \theta_{1})$ rather than $K(f_{\theta_{0}}, f_{\theta_{1}})$ . Then we have an additional fact. When hypotheses are ``close'', K-L distance behaves approximately like the square of the Euclidean metric in parameter ( $\theta$ )-space. Specifically:

$\displaystyle 2K(\theta_{0}, \theta_{1}) \approx (\theta_{1} - \theta_{0})'J(\theta_{0})(\theta_{1} - \theta_{0}).$

where $J(\theta_{0})$ is the Fisher information. The right hand side is sometimes called the square of the Mahalanobis distance.

Furthermore, we may assume the hypotheses are ``close'' enough that $J(\theta_{0}) \approx J(\theta_{1})$ . Then, K-L information appears also symmetric.

Practically there is still the problem to choose

, or to choose ``desirable'' probabilities of miss and false alarm which obey Stein's lemma, which gives also the data size. We can solve for

given the error probabilities. However, it is often ``unnatural'' to specify these probabilities; instead, we are concerned about other, observable effects on the system. Hence, the usual scenario results in a lot of lost sleep, as we are continually varying

, running simulations, and then observing some distant outcome.

Fortunately, the Bayesian approach comes to the rescue. Instead of optimizing a probability tradeoff, we assign costs: $C_{M} > 0$ to a miss event and $C_{FA} > 0$ to a false alarm event. Additionally, we have a prior distribution on $\theta$

$\displaystyle P(\theta = \theta_{1})$

$\displaystyle =$

$\displaystyle \pi_{1}$

Let $D(y_{1:N})$ be the decision function as before. The Bayes risk, or expected cost, is as follows.

$\displaystyle R(D)$

$\displaystyle =$

$\displaystyle \pi_{1} E\left[D(y_{1:N}) = 0 \vert \theta = \theta_{1}\right] + (1 - \pi_{1}) E\left[D(y_{1:N}) = 1 \vert \theta = \theta_{0}\right]$

It follows, the optimum-Bayes risk decision also involves comparing the likelihood ratio to a threshold:

$\displaystyle D(y_{1:N})$	$\displaystyle =$	$\displaystyle 1_{\frac{P(y \vert \theta_{1})}{P(y \vert \theta_{0})} > \frac{C_{FA}P(\theta_{0})}{C_{M}P(\theta_{1})}}$
	$\displaystyle =$	$\displaystyle 1_{\frac{f_{\theta_{1}}(y)}{f_{\theta_{0}}(y)} > \frac{C_{FA}(1-\pi_{1})}{C_{M}\pi_{1}}}$

We see the threshold is available in closed form, as a function of costs and priors.