Inverse Transforming STFT Bin Groups

The solution proposed in [264] is to compute
*multiple* time samples for each high-frequency channel, so that
one hop of the FFT produces all needed samples in each band. That way
all channels can use the same hop-size without redundancy. If the
narrowest band gets one sample per hop, then a band
times as wide
must produce
samples per hop.

A fast way to compute multiple time samples from the frequency-samples
(FFT bins) of a given band is an inverse FFT, as shown in Fig.10.30.
In matlab notation, let `X(1:N)` denote the FFT (length `N`) of the
current frame of data in the STFT, and denote the lower and upper
spectral samples for band `k` by `lsn(k)` and `hsn(k)`,
respectively. Then we may specify the matlab computation of the
full-rate time-domain signal corresponding to band `k` as follows:

BandK = X(lsn(k):hsn(k)); z1 = zeros(1,lsn(k)-1); z2 = zeros(1,N-hsn(k)); BandKzp = [z1, BandK, z2]; % (1) x(k,:) = ifft(BandKzp);where

Let denote the number of FFT bins in band . When is a power of 2, we can apply an inverse FFT only to the nonzero samples in the band:

xd{k} = ifft(BandK);where

We may relate `x(k,:)` and `xd{k}` by noting that, when
`Lk = N/Nk` is an integer, we have that the relation

BandK == alias(BandKzp,Lk) % (2)is true, for each element, where

function y = alias(x,L) Nx = length(x); Np = Nx/L; % aliasing-partition length x = x(:); % ensure column vector y = zeros(Np,1); for i=1:L y = y + x(1+(i-1)*Np : i*Np); end

By the *aliasing theorem* (*downsampling theorem*) for
Discrete Fourier Transforms (DFTs) (§2.3.12), the relation
`(2)` in the frequency domain corresponds to

xd{k} == Lk * x(k,1:Lk:end)in the time domain,

We see that taking an inverse FFT of only the bins in a given channel
computes the critically downsampled signal for that channel.
Downsampling factors between 1 and `Lk` can be implemented by choosing
a zero-padding factor for the band somewhere between 1 and `Lk`
samples.

Note that this filter bank has the *perfect reconstruction* (PR)
property [286]. That is, the original input signal
can be exactly reconstructed (to within a delay and possible scale
factor) from the channel signals.
The PR property follows immediately
from the exact invertibility of the discrete Fourier transform.
Specifically, we can recover the
original signal frame by taking an FFT of each channel-signal frame
and abutting the spectral bins so computed to form the original frame
spectrum. Of course, the underlying STFT (uniform filter bank) must
be PR as well, as it routinely is in practice.

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

[Lecture Video] [Exercises] [Examination]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University