Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

The Short-Time Fourier Transform (STFT)

Computation of the STFT consists of the following steps:

1. Read $ M$ samples of the input signal $ x$ into a local buffer,

$\displaystyle x_m(n) \mathrel{\stackrel{\mathrm{\Delta}}{=}}x(n-mR), \qquad n=-M_h,-M_h+1,\ldots\,,-1,0,1,\ldots\,,M_h-1,M_h
$

where $ x_m$ is called the $ m$th frame of the input signal, and $ M\mathrel{\stackrel{\mathrm{\Delta}}{=}}2M_h+1$ is the frame length (which we assume is odd for reasons to be discussed later). The time advance $ R$ (in samples) from one frame to the next is called the hop size.

2. Multiply the data frame pointwise by a length $ M$ spectrum analysis window $ w(n), n=-M_h,\ldots\,,M_h$ to obtain the $ m$th windowed data frame:

$\displaystyle \tilde{x}_m(n) \mathrel{\stackrel{\mathrm{\Delta}}{=}}x_m(n) w(n), \qquad n=-{\frac{M-1}{2}},\ldots\,,{\frac{M-1}{2}}
$

3. Extend $ \tilde{x}_m$ with zeros on both sides to obtain a zero-padded windowed data frame:

$\displaystyle \tilde{x}_m^\prime (n) \mathrel{\stackrel{\mathrm{\Delta}}{=}}\le...
...}-1 \\ [5pt]
0, & -\frac{N}{2}\leq n < -{\frac{M-1}{2}} \\
\end{array}\right.
$

where $ N$ is the FFT size, chosen to be a power of two larger than $ M$. The number $ N/M$ is called the zero-padding factor.

4. Take a length $ N$ FFT of $ \tilde{x}_m$ to obtain the STFT at time $ m$:

$\displaystyle \tilde{x}_m^\prime (e^{j\omega_k })=\sum _{n=-N/2}^{N/2-1} \tilde{x}_m^\prime (n) e^{-j\omega_k n T}
$

where $ \omega_k = 2\pi k f_s / N $, and $ f_s=1/T$ is the sampling rate in Hz. The STFT bin number is $ k$. Each bin $ \tilde{x}_m^\prime (e^{j\omega_k })$ of the STFT can be regarded as a sample of the complex signal at the output of a lowpass filter whose input is $ \tilde{x}_m^\prime (n) e^{-j\omega_k m T}$; this signal is $ \tilde{x}_m^\prime (n)$ frequency-shifted so that frequency $ \omega_k $ is moved to 0 Hz. In this interpretation, the hop size $ R$ is the downsampling factor applied to each bandpass output, and the analysis window $ w(\,\cdot\,)$ is the impulse response of the anti-aliasing filter used with the downsampling.

The zero-padding factor is the interpolation factor for the spectrum, i.e., each FFT bin is replaced by $ N/M$ bins, interpolating the spectrum.


Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Download parshl.pdf

``PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation'', by Julius O. Smith III and Xavier Serra, Proceedings of the International Computer Music Conference (ICMC-87, Tokyo), Computer Music Association, 1987.
Copyright © 2005-12-28 by Julius O. Smith III and Xavier Serra
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [Automatic-links disclaimer]