Next: Basics Up: An Experimental High Fidelity Previous: MPEG-2 AAC

Human Audio Perception: Masking

The human auditory system has, as mentioned in the introduction, some interesting properties, which are exploited in perceptual audio coding. We have a dynamic frequency range from about 20 to 20000 Hz, and we hear sounds with intensity varying over many magnitudes. The hearing system may thus seem to be a very wide-range instrument, which is not altogether true. To obtain those characteristics , the hearing is very adaptive -- what we hear depends on what kind of audio environment we are in. In the presence of a strong white noise, for example, many weaker sounds get masked (see section 3.2), and thus we cannot hear them at all. Some of these masking characteristics are due to the physical ear, and some are due to the processing in the brain.

Using masking principles, experiments have been performed by others where correctly shaped noise has been added to audio data without audible effect down to an SNR of 25 dB. On the other hand, deliberately ``wrongly'' shaped noise, i.e noise with high energy in sensitive areas can be audible up to an SNR of 90 dB.

I will now show some of the most important masking properties of the ear, and the models of those. The models are combined in the coder to produce a masking threshold curve every 256 samples (5.8 ms), which is used to quantize the audio data. According to the model, noise under that threshold is completely inaudible to the listener. See section 3.4 for a description how the masking threshold is used in quantization.

Bosse Lincoln
Sat Mar 7 16:27:43 PST 1998