It is well known that the frequency resolution of human hearing
decreases with frequency [#!FastlAndZwicker06!#,#!StevensAndDavis38!#].
As a result, any ``auditory filter bank'' must be a *nonuniform*
filter bank in which the channel bandwidths increase with frequency
over most of the spectrum. A classic approximate example is the
*third-octave filter bank*.^{11.14} A simpler (cruder)
approximation is the *octave filter bank*,^{11.15} also called a
*dyadic filter bank* when implemented using a binary tree
structure [#!Vaidyanathan93!#]. Both are examples of
*constant-Q filter banks*
[#!Brown91!#,#!BrownAndPuckette92!#,#!SchoerkhuberAndKlapuriSMC2010!#], in
which the bandwidth of each filter-bank channel is proportional to
center frequency [#!JOSFP!#]. Approximate auditory filter banks,
such as constant-Q filter banks, have extensive applications in
computer music, audio engineering, and basic hearing research.

If the output signals from all channels of a constant-Q filter bank
are all *sampled* at a particular time, we obtain what may be called
a constant-Q *transform* [#!Brown91!#]. A constant-Q transform
can be efficiently implemented by smoothing the output of a Fast
Fourier Transform (FFT) [#!BrownAndPuckette92!#]. More generally, a
*multiresolution spectrogram* can be implemented by combining
FFTs of different lengths and advancing the FFTs forward through time
(§7.3). Such nonuniform filter banks can also be
implemented based on the Goetzel algorithm [#!CassidyAndSmith07!#].

While the topic of *filter banks* is well developed in the literature,
including constant-Q, nonuniform FFT-based, and wavelet filter banks,
the simple, robust methods presented in this section appear to be new
[#!SmithFB09!#].
In particular, classic nonuniform FFT filter banks as described in
[#!RabinerAndSchafer78!#] have not offered the *perfect
reconstruction* property [#!Vaidyanathan93!#] in which the
filter-bank sum yields the input signal exactly (to within a delay
and/or scale factor) when the filter-band signals are not modified.
The voluminous literature on perfect-reconstruction filter banks
[#!Vaidyanathan93!#] addresses nonuniform filter banks, such as
dyadic filter banks designed based on pseudo quadrature mirror filter
designs, but simpler STFT methods do not yet appear to be
incorporated. In the cosine-modulated filter-bank domain, subband
DCTs have been used in a related way [#!ZijingAndYun07!#], but
apparently without consideration for the possibility of a common time
domain across multiple channels.^{11.16}

This section can be viewed as an extension of
[#!BrownAndPuckette92!#] to the FFT filter-bank case. Alternatively,
it may be viewed as a novel method for nonuniform FIR filter-bank
design and implementation, based on STFT methodology, with arbitrarily
accurate reconstruction and controlled aliasing in the downsampled
case. While we consider only auditory (approximately constant-Q)
filter banks, the method works equally well for arbitrary nonuniform
spectral partitions and *overlap-add decompositions* in the frequency
domain.

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University