Loudness Spectrogram

The purpose of a *loudness spectrogram* is to display some
psychoacoustic model of *loudness versus time and frequency*.
Instead of specifying FFT window length and type, one specifies
*conditions of presentation*, such as physical amplitude level
in dB SPL, angle of arrival at the ears, etc. By default, it can be
assumed that the signal is presented to both ears equally, and the
listening level can be normalized to a ``comfortable'' value such as
70 dB SPL.^{8.6}

A time-varying model of loudness perception has been developed by Moore and Glasberg et al. [87,182,88]. A loudness spectrogram based on this work may consist of the following processing steps:

- Compute a
*multiresolution STFT*(MRSTFT) which approximates the frequency-dependent frequency and time resolution of the ear. Several FFTs of different lengths may be combined in such a way that time resolution is higher at high frequencies, and frequency resolution is higher at low frequencies, like in the ear. In each FFT, the frequency resolution must be greater than or equal to that of the ear in the frequency band it covers. (Even ``much greater'' is ok, since the resolution will be reduced down to what it should be by smoothing in Step 2.) - Form the
*excitation pattern*from the MRSTFT by resampling the FFTs of the previous step using interpolation kernels shaped like auditory filters. The new spectral sampling intervals should be proportional to the width of a*critical band of hearing*at each frequency. The shape of each interpolation kernel (auditory filter) should change with amplitude level as well as center frequency [87]. This step effectively converts the uniform filter bank of the FFT to an auditory filter bank.^{8.7} - Compute the
*specific loudness*from the excitation pattern for each frame. This step implements a compressive nonlinearity which depends on the frequency and level of the excitation pattern [182]. The specific loudness can be interpreted as*loudness per ERB*. - If desired, the
*instantaneous loudness*can be computed as the the sum of the specific loudness over all frequency samples at a fixed time. Similarly, short- and long-term time-varying loudness estimates can be computed as lowpass-filterings of the instantaneous loudness over time [88].

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University