next up previous contents
Next: Coefficient Quantization Up: Quantization and Coding Previous: Coefficient Clustering - Frequency

Coefficient Clustering - Time Domain

  At tonal parts of the signal frequency coefficients are highly correlated in the time domain, since a tone corresponds to a stationary peak in the frequency domain. This is exploited in the encoder by always encoding four MDCT blocks at a time. To avoid artifacts at transitions, i.e when the masking threshold changes abruptly, two modes of operation are introduced, one of which is chosen for each band:

  1. Transient mode. The four MDCT blocks are encoded individually, and thus having an individual encoder step size per block and band. The MDCT coefficients are quantized and encoded as described in section 4.2.4.
  2. Stationary mode. The four MDCT blocks are jointly coded, using only one quantizer step size per band. The coefficients are transformed using a fixed KLT (section 4.1.3), quantized and encoded. The KLT basis was estimated from a tonal mono string sequence, which contains about 2000 frames.
The mode decision is done based on the mean of estimated variances of the masking threshold over the four blocks for all frequencies in the band:

equation361

If Var > thresh, then the Transient mode is used, otherwise Stationary mode. The value thresh = 0.02 which is used in the coder, was found empirically.

The Stationary mode tries to use the energy compaction property of the KLT in the following fashion: Since the first few coefficients of the KLT probably have higher energy then the later ones, the transform can without greater loss be performed with only a subset of the basis vectors U. Thus, the p last coefficients from the KLT are never trasmitted. Experiments has shown that this works fine in the bands with many frequency bins, which leads to the following heuristic for determining which coefficients to skip: Use the P first coefficients, where P is chosen so that

  eqnarray373

where tex2html_wrap_inline1104 is the band number, and tex2html_wrap_inline1106 are the KLT transform coefficients. This heuristic ``cuts'' the transform when enough energy has been included. More energy is required for lower bands, where tonal instruments, such as strings, sound very bad without that restriction.

An experiment on audio clip music.wav gives the average ``coefficient ratio'' in table 4.2.1, where 1.0 corresponds to sending all coefficients, and 0 to not sending any. The effect of the weighting equations above is clearly visible in the table. In e.g music.wav, the overall bitrate is 121 kbit/second without the KLT and 106 with. It should be noted also that the KLT option without the skipping of coefficients gives no bitrate savings. Thus, the only gain I get from the KLT is that the quantization noise from zeroed coefficients can be spread over the whole band.


next up previous contents
Next: Coefficient Quantization Up: Quantization and Coding Previous: Coefficient Clustering - Frequency

Bosse Lincoln
Sat Mar 7 16:27:43 PST 1998