Summary

Summary

In this chapter, we looked at a variety of time-frequency displays appropriate for audio signals. All were implemented in terms of the short-time Fourier transform (STFT). The classical spectrogram was reviewed, and its performance on a speech sample was illustrated. A loudness spectrogram based on a model of time-varying loudness perception [88] was discussed. In this model, the STFT (or a multiresolution STFT), is smoothed and non-uniformly resampled in frequency to approximate an auditory filter bank, whose power output is taken to be the excitation pattern. A compressive nonlinearity is then applied to produce the specific loudness, which we took as our loudness spectrogram. The specific loudness can be optionally smoothed with respect to time to form a short- or long-term loudness spectrogram. Summing over frequency yields the corresponding loudness functions versus time.

FFT-based non-uniform filter banks, providing more efficient loudness spectrograms, are discussed in §10.7.

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

Summary

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1. Copyright © 2022-02-28 by Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University