Analysis Window (Step 1)

The choice of the analysis window is important. It determines the trade-off of time versus frequency resolution which affects the smoothness of the spectrum and the detectability of the frequency peaks. The most commonly used windows are called Rectangular, Triangular, Hamming, Hanning, Kaiser, and Chebyshev. Harris [7,14] gives a good discussion of these windows and many others.

To understand the effect of the window lets look at what happens to a sinusoid when we Fourier transform it. A complex sinusoid of the form

(7) | |||

(8) | |||

(9) |

Thus, the transform of a windowed sinusoid, isolated or part of a complex tone, is the transform of the window scaled by the amplitude of the sinusoid and centered at the sinusoid's frequency.

All the standard windows are real and symmetric and have spectra of a sinc-like shape (as in Fig. 1). Considering the applications of the program, our choice will be mainly determined by two of the spectrum's characteristics: the width of the main lobe, defined as the number of bins (DFT-sample points) between the two zero crossings, and the highest side-lobe level, which measures how many dB down is the highest side-lobe from the main lobe. Ideally we would like a narrow main lobe (good resolution) and a very low side-lobe level (no cross-talk between FFT channels). The choice of window determines this trade-off. For example, the rectangular window has the narrowest main lobe, bins, but the first side-lobe is very high, dB relative to the main-lobe peak. The Hamming window has a wider main lobe, bins, and the highest side-lobe is dB down. The Blackman window worst-case side-lobe rejection is 58 dB down which is good for audio applications. A very different window, the Kaiser, allows control of the trade-off between the main-lobe width and the highest side-lobe level. If we want less main-lobe width we will get higher side-lobe level and vice versa. Since control of this trade-off is valuable, the Kaiser window is a good general-purpose choice.

Let's look at this problem in a more practical situation. To ``resolve'' two sinusoids separated in frequency by Hz, we need (in noisy conditions) two clearly discernible main lobes; i.e., they should look something like in Fig. 2. To obtain the separation shown (main lobes meet near a 0-crossing), we require a main-lobe bandwidth in Hz such that

(10) | |||

(11) |

where is the main-lobe bandwidth (in bins), the sampling rate, is the window length, and are the frequencies of the sinusoids. Thus, we need

If and are successive harmonics of a fundamental frequency , then . Thus, harmonic resolution requires and thus . Note that , the period in samples. Hence,

While the main lobe should be narrow enough to resolve adjacent peaks, it should not be narrower than necessary in order to maximize time resolution in the STFT.

Since for most windows the main lobe is much wider than any side lobe, we can use this fact to avoid spurious peaks due to side-lobes oscillation. Any peak that is substantially narrower than the main-lobe width of the analysis window will be rejected as a local maximum due to side-lobe oscillations.

A final point we want to make about windows is the choice between odd and even length. An odd length window can be centered around the middle sample, while an even length one does not have a mid-point sample. If one end-point is deleted, an odd-length window can be overlapped and added so as to satisfy Eq. (6). For purposes of phase detection, we prefer a zero-phase window spectrum, and this is obtained most naturally by using a symmetric window with a sample at the time origin. We therefore use odd length windows exclusively in PARSHL.

Download parshl.pdf

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

[Automatic-links disclaimer]