Audio Spectrograms

Since classic spectrograms [132] typically
show *log-magnitude intensity* (dB) versus time and frequency,
and since sound-pressure level in dB is roughly proportional to
perceived *loudness*, at least at high levels
[179,276,305], we can say that a
classic spectrogram provides a reasonably good *psychoacoustic
display* for sound, provided the window length has been chosen to be
comparable to the ``integration time'' of the ear.

However, there are several ways we can improve the classic spectrogram
to obtain more psychoacoustically faithful displays of *perceived
loudness* versus time and frequency:

- Loudness perception is closer to
*linearly*related to amplitude at low loudness levels. - Since the STFT offers only one ``integration time'' (the window
length), it implements a
*uniform bandpass filter bank*--*i.e.*, spectral samples are uniformly spaced and correspond to equal bandwidths. The window transform gives the shape of each effective bandpass filter in the frequency domain. The choice of window length determines the common time- and frequency-resolution at all frequencies. Figure 9.14 shows a frequency-response overlay of all 5 channel filters created by a length 5 DFT using a zero-phase rectangular window.In the ear, however, time resolution increases and frequency resolution decreases at higher frequencies. Thus, the ear implements a

*non-uniform filter bank*, with wider bandwidths at higher frequencies. In the time domain, the integration time (effective ``window length'') becomes shorter at higher frequencies.

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University