Next  |  Prev  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search


Analysis Window (Step 1)

The choice of the analysis window is important. It determines the trade-off of time versus frequency resolution which affects the smoothness of the spectrum and the detectability of the frequency peaks. The most commonly used windows are called Rectangular, Triangular, Hamming, Hanning, Kaiser, and Chebyshev. Harris [7,14] gives a good discussion of these windows and many others.

To understand the effect of the window lets look at what happens to a sinusoid when we Fourier transform it. A complex sinusoid of the form

$\displaystyle x(n) = A e^{j\omega_xnT}
$

when windowed, transforms to
$\displaystyle X_w(\omega)$ $\displaystyle =$ $\displaystyle \sum_{n=-\infty}^\infty x(n) w(n) e^{-j\omega nT}$ (7)
  $\displaystyle =$ $\displaystyle A\sum_{n={-(M-1)/2}}^{(M-1)/2}w(n)e^{-j(\omega-\omega_x)nT}$ (8)
  $\displaystyle =$ $\displaystyle AW(\omega-\omega_x)$ (9)

Thus, the transform of a windowed sinusoid, isolated or part of a complex tone, is the transform of the window scaled by the amplitude of the sinusoid and centered at the sinusoid's frequency.

Figure 1: Log magnitude of the transform of a triangle window.
\includegraphics{eps/fig1.eps}

All the standard windows are real and symmetric and have spectra of a sinc-like shape (as in Fig. 1). Considering the applications of the program, our choice will be mainly determined by two of the spectrum's characteristics: the width of the main lobe, defined as the number of bins (DFT-sample points) between the two zero crossings, and the highest side-lobe level, which measures how many dB down is the highest side-lobe from the main lobe. Ideally we would like a narrow main lobe (good resolution) and a very low side-lobe level (no cross-talk between FFT channels). The choice of window determines this trade-off. For example, the rectangular window has the narrowest main lobe, $ 2$ bins, but the first side-lobe is very high, $ -13$dB relative to the main-lobe peak. The Hamming window has a wider main lobe, $ 4$ bins, and the highest side-lobe is $ 42$dB down. The Blackman window worst-case side-lobe rejection is 58 dB down which is good for audio applications. A very different window, the Kaiser, allows control of the trade-off between the main-lobe width and the highest side-lobe level. If we want less main-lobe width we will get higher side-lobe level and vice versa. Since control of this trade-off is valuable, the Kaiser window is a good general-purpose choice.

Figure 2: Spectrum of two clearly separated sinusoids.
\includegraphics{eps/fig2.eps}

Let's look at this problem in a more practical situation. To ``resolve'' two sinusoids separated in frequency by $ \Delta$ Hz, we need (in noisy conditions) two clearly discernible main lobes; i.e., they should look something like in Fig. 2. To obtain the separation shown (main lobes meet near a 0-crossing), we require a main-lobe bandwidth $ B_f$ in Hz such that

$\displaystyle B_f\leq \Delta.
$

In more detail, we have
$\displaystyle B_f$ $\displaystyle =$ $\displaystyle {K\frac{f_s}{M}}$ (10)
$\displaystyle \Delta$ $\displaystyle =$ $\displaystyle f_2 - f_1$ (11)

where $ K$ is the main-lobe bandwidth (in bins), $ f_s$ the sampling rate, $ M$ is the window length, and $ f_1, f_2$ are the frequencies of the sinusoids. Thus, we need

$\displaystyle M\geq K\frac{f_s }{\Delta} = K \frac{f_s}{f_2-f_1}
$

If $ f_k$ and $ f_{k+1}$ are successive harmonics of a fundamental frequency $ f_1$, then $ f_1 = f_{k+1}-f_k= \Delta$. Thus, harmonic resolution requires $ B_f\leq f_1$ and thus $ M\geq Kf_s/f_1$. Note that $ f_s/f_1 =
T_1/T =P$, the period in samples. Hence,

$\displaystyle M\geq KP
$

Thus, with a Hamming window, with main-lobe bandwidth $ K = 4$ bins, we want at least four periods of a harmonic signal under the window. More generally, for two sinusoids at any frequencies $ f_1$ and $ f_2$, we want four periods of the difference frequency $ \vert f_2-f_1\vert$ under the window.

While the main lobe should be narrow enough to resolve adjacent peaks, it should not be narrower than necessary in order to maximize time resolution in the STFT.

Since for most windows the main lobe is much wider than any side lobe, we can use this fact to avoid spurious peaks due to side-lobes oscillation. Any peak that is substantially narrower than the main-lobe width of the analysis window will be rejected as a local maximum due to side-lobe oscillations.

A final point we want to make about windows is the choice between odd and even length. An odd length window can be centered around the middle sample, while an even length one does not have a mid-point sample. If one end-point is deleted, an odd-length window can be overlapped and added so as to satisfy Eq. (6). For purposes of phase detection, we prefer a zero-phase window spectrum, and this is obtained most naturally by using a symmetric window with a sample at the time origin. We therefore use odd length windows exclusively in PARSHL.



Subsections
Next  |  Prev  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Download parshl.pdf

``PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation'', by Julius O. Smith III and Xavier Serra, Proceedings of the International Computer Music Conference (ICMC-87, Tokyo), Computer Music Association, 1987.
Copyright © 2005-12-28 by Julius O. Smith III and Xavier Serra
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [Automatic-links disclaimer]