A quick explanation what is done in the scalable sampling rate profile of MPEG-2 AAC [2] is interesting to understand the current state-of-the-art. The audio data is first split into four uniform subbands using a Polyphase Quadrature Filter (PQF). For each of the four subbands an individual gain is transmitted as side information. The gain-controlled subband data is then transformed using an MDCT of length 256 (or 32 for transient conditions). The window used for the MDCT is either the Kaiser-Bessel derived (KBD) or the sine window, which has different spectral characteristics, suitable for different signals. For transient conditions, a shorter window is used for improved time resolution.

The MDCT coefficients are predicted from the two preceding frames, using a separate LMS-adapted (Least Mean Square) predictor for every frequency band. This improves coding efficiency for stationary signals. Residuals after the prediction are non-uniformly quantized and coded using one of 12 different Huffman codes.

In MPEG-2 AAC there are a lot of optional extra features. One of the most
interesting is Temporal Noise Shaping (TNS), which works well for
transient signals. The idea is, that a tonal signal in the time domain has
transient peaks in the frequency domain. The dual of this, is that a
signal which is transient in the time domain is ``tonal'' in the frequency
domain, i.e consists mainly of a few sines. ``Tonal'' sounds are easily
predicted using a LPC approach. Thus, a simple linear predictor is used to
predict the next *spectral* sample (going from low frequencies to high)
from it lower-frequency neighbors.

Sat Mar 7 16:27:43 PST 1998