Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Linear Prediction Spectral Envelope

Linear Prediction (LP) implicitly computes a spectral envelope that is well adapted for audio work, provided the order of the predictor is appropriately chosen. Due to the error minimized by LP, spectral peaks are emphasized in the envelope, as they are in the auditory system. (The peak-emphasis of LP is quantified in (10.10) below.)

The term ``linear prediction'' refers to the process of predicting a signal sample $ y(n)$ based on $ M$ past samples:

$\displaystyle y(n) \eqsp -a_1 y(n-1) - a_2 y(n-2) - \cdots - a_M y(n-M) + e(n) \protect$ (11.4)

We call $ M$ the order of the linear predictor, and $ \{a_i\}_{i=1}^M$ the prediction coefficients. The prediction error (or ``innovations sequence'' [114]) is denoted $ e(n)$ in (10.4), and it represents all new information entering the signal $ y$ at time $ n$ . Because the information is new, $ e(n)$ is ``unpredictable.'' The predictable component of $ y(n)$ contains no new information.

Taking the z transform of (10.4) yields

$\displaystyle Y(z) \eqsp \frac{E(z)}{A(z)}$ (11.5)

where $ A(z) = 1 + a_1z^{-1}+ \cdots a_M z^{-M}$ . In signal modeling by linear prediction, we are given the signal $ y(n)$ but not the prediction coefficients $ a_i$ . We must therefore estimate them. Let $ {\hat A}(z) = 1 + {\hat a}_1z^{-1}
+ \cdots {\hat a}_M z^{-M}$ denote the polynomial with estimated prediction coefficients $ {\hat a}_i$ . Then we have

$\displaystyle Y(z) \eqsp \frac{{\hat E}(z)}{{\hat A}(z)}$ (11.6)

where $ {\hat E}(z)$ denotes the estimated prediction-error z transform. By minimizing $ \vert\vert\,{\hat E}\,\vert\vert _2$ , we define a minimum-least-squares estimate $ {\hat A}$ . In other words, the linear prediction coefficients $ {\hat a}_i$ are defined as those which minimize the sum of squared prediction errors $ {\hat e}(n)$

$\displaystyle \left\Vert\,{\hat e}\,\right\Vert _2^2 \eqsp \sum_n {\hat e}^2(n)$ (11.7)

over some range of $ n$ , typically an interval over which the signal is stationary (defined in Chapter 6). It turns out that this minimization results in maximally flattening the prediction-error spectrum $ E(z)$ [11,157,162]. That is, the optimal $ {\hat A}(z)$ is a whitening filter (also called an inverse filter). This makes sense in terms of Chapter 6 when one considers that a flat power spectral density corresponds to white noise in the time domain, and only white noise is completely unpredictable from one sample to the next. A non-flat spectrum corresponds to a nonzero correlation between two signal samples separated by some nonzero time interval.

If the prediction-error is successfully whitened, then the signal model can be expressed in the frequency domain as

$\displaystyle S_y(\omega) \eqsp \frac{\sigma^2_e}{\vert A(\omega)\vert^2}$ (11.8)

where $ S_y(\omega)$ denotes the power spectral density of $ y$ (defined in Chapter 6), and $ \sigma_e^2$ denotes the variance of the (white-noise) prediction error $ e(n)$ . Thus, the spectral magnitude envelope may be defined as

EnvelopeLPC$\displaystyle _y(\omega) \eqsp \frac{\sigma_e}{\vert A(\omega)\vert}$ (11.9)

Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

[How to cite this work]  [Order a printed hardcopy]  [Comment on this page via email]
[Watch the Video]  [Work some Exercises]  [Examination]  
``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2015-02-01 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University