- The use of a deterministic least squares criterion
has an important statistical interpretation when
the observations are Gauss-Markov. A ``Gauss-Markov'' process results
from a finite-length all-pole filter driven by white Gaussian noise,
as in (1) where
are ``filter coefficients''
and
is the ``noise''. The ``Markov'' property refers to the
finite memory of the filter, and means that given the pth-order
past
,
is independent of the further past
.
- Asymptotic efficiency of maximum likelihood
From the statistical interpretation, (1) gives a
parametric form for the observations' joint density. We wish to estimate
the unknown parameters,
, which can be thought of
as a block vector, say
. A good, or ``efficient'' estimate
is a function of observations with the
following properties:
- Unbiased:
- Minimum variance:
is PSD-minimal
By the Cramer-Rao inequality, the maximum likelihood (ML) estimate:
(
gives the joint density) is asymptotically efficient
as
.
- Computing the ML estimate in the Gauss-Markov case, we have:
The second step is justified by the Markov property: conditional
independence allows us to drop conditioning on the further past
.
The
is due to the absence of data for
. These
``edge effects'' wash out for large
. The third step comes by the fact that conditional
on
,
and
differ by a constant; thus the
Jacobian for the change-of-variables
is identity.
Finally, the last step results from independence of the
and
the fact
depends only on present and past
.
- Now to maximize the likelihood, it is equivalent to minimize the
negative log likelihood,
. From
(10) and the form of the Gaussian likelihood:
The first term is constant w.r.t.
and may be dropped. The second term
is equivalent to the weighted-norm criterion (4)
with
. Hence in the Gauss-Markov case, the deterministic
least-squares criterion approaches the ML criterion for large
, giving an
asymptotically efficient estimate of the
.