Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Linear Predictive Coding of Speech

Approximately a decade after the Kelly-Lochbaum voice model was developed, Linear Predictive Coding (LPC) of speech began [20,298,299]. The linear-prediction voice model is best classified as a parametric, spectral, source-filter model, in which the short-time spectrum is decomposed into a flat excitation spectrum multiplied by a smooth spectral envelope capturing primarily vocal formants (resonances).

LPC has been used quite often as a spectral transformation technique in computer music, as well as for general-purpose audio spectral envelopes [384], and it remains much used for low-bit-rate speech coding in the variant known as Codebook Excited Linear Prediction (CELP) [340].A.18When applying LPC to audio at high sampling rates, it is important to carry out some kind of auditory frequency warping, such as according to mel, Bark, or ERB frequency scales [183,461,485].

Interestingly, it was recognized from the beginning that the all-pole LPC vocal-tract model could be interpreted as a modified piecewise-cylindrical acoustic-tube model [20,299], and this interpretation was most explicit when the vocal-tract filters (computed by LPC in direct form) were realized as ladder filters [299]. The physical interpretation is not really valid, however, unless the vocal-tract filter parameters are estimated jointly with a realistic glottal pulse shape. LPC demands that the vocal tract be driven by a flat spectrum--either an impulse (or low-pitched impulse train) or white noise--which is not physically accurate. When the glottal pulse shape (and lip radiation characteristic) are ``factored out'', it becomes possible to convert LPC coefficients into vocal-tract shape parameters (area ratios). Approximate results can be obtained by assuming a simple roll-off characteristic for the glottal pulse spectrum (e.g., -12 dB/octave) and lip-radiation frequency response (nominally +6dB /octave), and compensating with a simple preemphasis characteristic (e.g., $ 12 - 6 = 6$ dB/octave) [299]. More accurate glottal pulse estimation in terms of parameters of the derivative-glottal-wave models by Liljencrants, Fant, and Klatt [134,259] (still assuming +6dB/octave for lip radiation) was carried out in the thesis research of Vicky Lu [292], and further extension of that work appears in [252,214,253].


Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

[How to cite this work]  [Order a printed hardcopy]  [Comment on this page via email]

``Physical Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2010, ISBN 978-0-9745607-2-4
Copyright © 2023-08-20 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA