A quick explanation what is done in the scalable sampling rate profile of MPEG-2 AAC [2] is interesting to understand the current state-of-the-art. The audio data is first split into four uniform subbands using a Polyphase Quadrature Filter (PQF). For each of the four subbands an individual gain is transmitted as side information. The gain-controlled subband data is then transformed using an MDCT of length 256 (or 32 for transient conditions). The window used for the MDCT is either the Kaiser-Bessel derived (KBD) or the sine window, which has different spectral characteristics, suitable for different signals. For transient conditions, a shorter window is used for improved time resolution.
The MDCT coefficients are predicted from the two preceding frames, using a separate LMS-adapted (Least Mean Square) predictor for every frequency band. This improves coding efficiency for stationary signals. Residuals after the prediction are non-uniformly quantized and coded using one of 12 different Huffman codes.
In MPEG-2 AAC there are a lot of optional extra features. One of the most interesting is Temporal Noise Shaping (TNS), which works well for transient signals. The idea is, that a tonal signal in the time domain has transient peaks in the frequency domain. The dual of this, is that a signal which is transient in the time domain is ``tonal'' in the frequency domain, i.e consists mainly of a few sines. ``Tonal'' sounds are easily predicted using a LPC approach. Thus, a simple linear predictor is used to predict the next spectral sample (going from low frequencies to high) from it lower-frequency neighbors.