Loudness Spectrogram

Loudness Spectrogram

The purpose of a loudness spectrogram is to display some psychoacoustic model of loudness versus time and frequency. Instead of specifying FFT window length and type, one specifies conditions of presentation, such as physical amplitude level in dB SPL, angle of arrival at the ears, etc. By default, it can be assumed that the signal is presented to both ears equally, and the listening level can be normalized to a ``comfortable'' value such as 70 dB SPL.^8.6

A time-varying model of loudness perception has been developed by Moore and Glasberg et al. [87,182,88]. A loudness spectrogram based on this work may consist of the following processing steps:

Compute a multiresolution STFT (MRSTFT) which approximates the frequency-dependent frequency and time resolution of the ear. Several FFTs of different lengths may be combined in such a way that time resolution is higher at high frequencies, and frequency resolution is higher at low frequencies, like in the ear. In each FFT, the frequency resolution must be greater than or equal to that of the ear in the frequency band it covers. (Even ``much greater'' is ok, since the resolution will be reduced down to what it should be by smoothing in Step 2.)
Form the excitation pattern from the MRSTFT by resampling the FFTs of the previous step using interpolation kernels shaped like auditory filters. The new spectral sampling intervals should be proportional to the width of a critical band of hearing at each frequency. The shape of each interpolation kernel (auditory filter) should change with amplitude level as well as center frequency [87]. This step effectively converts the uniform filter bank of the FFT to an auditory filter bank.^8.7
Compute the specific loudness from the excitation pattern for each frame. This step implements a compressive nonlinearity which depends on the frequency and level of the excitation pattern [182]. The specific loudness can be interpreted as loudness per ERB.
If desired, the instantaneous loudness can be computed as the the sum of the specific loudness over all frequency samples at a fixed time. Similarly, short- and long-term time-varying loudness estimates can be computed as lowpass-filterings of the instantaneous loudness over time [88].

The specific loudness gives a useful definition of the ``loudness spectrogram.'' However, one might well prefer to filter it across the time dimension in the same manner that instantaneous loudness is filtered to produce short- and long-term loudness estimates versus time and frequency.

Subsections

A Note on Hop Size

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

Loudness Spectrogram

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1. Copyright © 2022-02-28 by Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University