It is well known that the frequency resolution of human hearing decreases with frequency [#!FastlAndZwicker06!#,#!StevensAndDavis38!#]. As a result, any ``auditory filter bank'' must be a nonuniform filter bank in which the channel bandwidths increase with frequency over most of the spectrum. A classic approximate example is the third-octave filter bank.11.14 A simpler (cruder) approximation is the octave filter bank,11.15 also called a dyadic filter bank when implemented using a binary tree structure [#!Vaidyanathan93!#]. Both are examples of constant-Q filter banks [#!Brown91!#,#!BrownAndPuckette92!#,#!SchoerkhuberAndKlapuriSMC2010!#], in which the bandwidth of each filter-bank channel is proportional to center frequency [#!JOSFP!#]. Approximate auditory filter banks, such as constant-Q filter banks, have extensive applications in computer music, audio engineering, and basic hearing research.
If the output signals from all channels of a constant-Q filter bank are all sampled at a particular time, we obtain what may be called a constant-Q transform [#!Brown91!#]. A constant-Q transform can be efficiently implemented by smoothing the output of a Fast Fourier Transform (FFT) [#!BrownAndPuckette92!#]. More generally, a multiresolution spectrogram can be implemented by combining FFTs of different lengths and advancing the FFTs forward through time (§7.3). Such nonuniform filter banks can also be implemented based on the Goetzel algorithm [#!CassidyAndSmith07!#].
While the topic of filter banks is well developed in the literature, including constant-Q, nonuniform FFT-based, and wavelet filter banks, the simple, robust methods presented in this section appear to be new [#!SmithFB09!#]. In particular, classic nonuniform FFT filter banks as described in [#!RabinerAndSchafer78!#] have not offered the perfect reconstruction property [#!Vaidyanathan93!#] in which the filter-bank sum yields the input signal exactly (to within a delay and/or scale factor) when the filter-band signals are not modified. The voluminous literature on perfect-reconstruction filter banks [#!Vaidyanathan93!#] addresses nonuniform filter banks, such as dyadic filter banks designed based on pseudo quadrature mirror filter designs, but simpler STFT methods do not yet appear to be incorporated. In the cosine-modulated filter-bank domain, subband DCTs have been used in a related way [#!ZijingAndYun07!#], but apparently without consideration for the possibility of a common time domain across multiple channels.11.16
This section can be viewed as an extension of [#!BrownAndPuckette92!#] to the FFT filter-bank case. Alternatively, it may be viewed as a novel method for nonuniform FIR filter-bank design and implementation, based on STFT methodology, with arbitrarily accurate reconstruction and controlled aliasing in the downsampled case. While we consider only auditory (approximately constant-Q) filter banks, the method works equally well for arbitrary nonuniform spectral partitions and overlap-add decompositions in the frequency domain.