At tonal parts of the signal frequency coefficients are highly correlated in the time domain, since a tone corresponds to a stationary peak in the frequency domain. This is exploited in the encoder by always encoding four MDCT blocks at a time. To avoid artifacts at transitions, i.e when the masking threshold changes abruptly, two modes of operation are introduced, one of which is chosen for each band:
If Var > thresh, then the Transient mode is used, otherwise Stationary mode. The value thresh = 0.02 which is used in the coder, was found empirically.
The Stationary mode tries to use the energy compaction property of the KLT in the following fashion: Since the first few coefficients of the KLT probably have higher energy then the later ones, the transform can without greater loss be performed with only a subset of the basis vectors U. Thus, the p last coefficients from the KLT are never trasmitted. Experiments has shown that this works fine in the bands with many frequency bins, which leads to the following heuristic for determining which coefficients to skip: Use the P first coefficients, where P is chosen so that
where is the band number, and are the KLT transform coefficients. This heuristic ``cuts'' the transform when enough energy has been included. More energy is required for lower bands, where tonal instruments, such as strings, sound very bad without that restriction.
An experiment on audio clip music.wav gives the average ``coefficient ratio'' in table 4.2.1, where 1.0 corresponds to sending all coefficients, and 0 to not sending any. The effect of the weighting equations above is clearly visible in the table. In e.g music.wav, the overall bitrate is 121 kbit/second without the KLT and 106 with. It should be noted also that the KLT option without the skipping of coefficients gives no bitrate savings. Thus, the only gain I get from the KLT is that the quantization noise from zeroed coefficients can be spread over the whole band.