Inverse Transforming STFT Bin Groups

Inverse Transforming STFT Bin Groups

The solution proposed in [265] is to compute multiple time samples for each high-frequency channel, so that one hop of the FFT produces all needed samples in each band. That way all channels can use the same hop-size without redundancy. If the narrowest band gets one sample per hop, then a band times as wide must produce samples per hop.

A fast way to compute multiple time samples from the frequency-samples (FFT bins) of a given band is an inverse FFT, as shown in Fig.10.30. In matlab notation, let X(1:N) denote the FFT (length N) of the current frame of data in the STFT, and denote the lower and upper spectral samples for band k by lsn(k) and hsn(k), respectively. Then we may specify the matlab computation of the full-rate time-domain signal corresponding to band k as follows:

BandK = X(lsn(k):hsn(k));
z1 = zeros(1,lsn(k)-1);
z2 = zeros(1,N-hsn(k));
BandKzp = [z1, BandK, z2];     % (1)
x(k,:) = ifft(BandKzp);

where x(k,:) denotes the output signal vector (length N) for the kth filter-bank channel for the current time-domain STFT window.

Let $\texttt{Nk}= \texttt{hsn(k)} - \texttt{lsn(k)} + 1$ denote the number of FFT bins in band . When $\texttt{Nk}$ is a power of 2, we can apply an inverse FFT only to the nonzero samples in the band:

xd{k} = ifft(BandK);

where xd{k} now denotes a cell array for holding the ``downsampled'' signal vectors (since the downsampling factor is typically different for different bands).

We may relate x(k,:) and xd{k} by noting that, when Lk = N/Nk is an integer, we have that the relation

BandK == alias(BandKzp,Lk)     % (2)

is true, for each element, where alias(x,K) denotes aliasing of K equal partitions of the vector x (§2.3.12):^11.17

function y = alias(x,L)
  Nx = length(x);
  Np = Nx/L; % aliasing-partition length
  x = x(:);  % ensure column vector
  y = zeros(Np,1);
  for i=1:L
    y = y + x(1+(i-1)*Np : i*Np);
  end

By the aliasing theorem (downsampling theorem) for Discrete Fourier Transforms (DFTs) (§2.3.12), the relation (2) in the frequency domain corresponds to

xd{k} == Lk * x(k,1:Lk:end)

in the time domain, i.e., xd{k} is obtained from x(k,:) by means of downsampling by the factor Lk. This produces N/Lk == Nk samples. That is, for a band that is Nk bins wide, we obtain Nk time-domain samples for each STFT frame, when critically sampled. (At the full rate, we obtain N samples from each channel for each frame.)

We see that taking an inverse FFT of only the bins in a given channel computes the critically downsampled signal for that channel. Downsampling factors between 1 and Lk can be implemented by choosing a zero-padding factor for the band somewhere between 1 and Lk samples.

Note that this filter bank has the perfect reconstruction (PR) property [287]. That is, the original input signal can be exactly reconstructed (to within a delay and possible scale factor) from the channel signals. The PR property follows immediately from the exact invertibility of the discrete Fourier transform. Specifically, we can recover the original signal frame by taking an FFT of each channel-signal frame and abutting the spectral bins so computed to form the original frame spectrum. Of course, the underlying STFT (uniform filter bank) must be PR as well, as it routinely is in practice.

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

Inverse Transforming STFT Bin Groups

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1. Copyright © 2022-02-28 by Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University