Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Summary of STFT Computation Using the FFT

  1. Read $ M$ samples of the input signal $ x$ into a local buffer of length $ N\geq M$ which is initially zeroed

    $\displaystyle \tilde{x}_m(n) \isdef x(n+mR), \qquad n=-M_h,
\ldots\,,-1,0,1,\ldots\,,
M_h
$

    where $ x_m$ is called the $ m$th frame of the input signal, and $ M\isdef 2M_h+1$ is the frame length (which we assume is odd for reasons to be discussed later). The time advance $ R$ (in samples) from one frame to the next is called the hop size or step size.

  2. Multiply the data frame pointwise by a length $ M$ spectrum analysis window $ w(n), n=-M_h,\ldots\,,M_h$ to obtain the $ m$th windowed data frame:

    $\displaystyle \tilde{x}_m^w(n) \isdef \tilde{x}_m(n) w(n), \qquad n=-M_h,\ldots\,,M_h
$

  3. Extend $ \tilde{x}_m^w$ with zeros on both sides to obtain a zero-padded windowed data frame:

    $\displaystyle \tilde{x}_m^{w,z}(n) \isdef \left\{\begin{array}{ll}
\tilde{x}_m^...
...frac{N}{2}}-1 \\ [5pt]
0, & -{\frac{N}{2}}\leq n < -M_h \\
\end{array}\right.
$

    where $ N$ is chosen to be a power of two larger than $ M$. The number $ N/M$ is the zero-padding factor; as discussed in §4.3, the zero-padding factor is the interpolation factor for the spectrum, i.e., each FFT bin is replaced by $ N/M$ bins, interpolating the spectrum using ideal bandlimited interpolation [243], where the ``band'' in this case is the $ M$-sample nonzero duration of $ \tilde{x}_m^w$ in the time domain.

  4. Take a length $ N$ FFT of $ \tilde{x}_m^{w,z}$ to obtain the STFT at time $ m$:

    $\displaystyle \tilde{X}_m^{w,z}(e^{j\omega_k })=\sum _{n=-N/2}^{N/2-1} \tilde{x}_m^{w,z}(n) e^{-j\omega_k n T}
$

    where $ \omega_k = 2\pi k f_s / N $, and $ f_s=1/T$ is the sampling rate in Hz. The STFT bin number is $ k$.


Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

[How to cite this work]  [Order a printed hardcopy]

``Spectral Audio Signal Processing'', by Julius O. Smith III, (March 2007 Draft).
Copyright © 2008-05-20 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [About the Automatic Links]