Next  |  Prev  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search


Filling the FFT Input Buffer (Step 2)

The FFT size $ N$ is normally chosen to be the first power of two that is at least twice the window length $ M$, with the difference $ N-M$ filled with zeros (``zero-padded''). The reason for increasing the FFT size and filling in with zeros is that zero-padding in the time domain corresponds to interpolation in the frequency domain, and interpolating the spectrum is useful in various ways. First, the problem of finding spectral peaks which are not exact bin frequencies is made easier when the spectrum is more densely sampled. Second, plots of the magnitude of the more smoothly sampled spectrum are less likely to confuse the untrained eye. (Only signals truly periodic in $ M$ samples should not be zero-padded. They should also be windowed only by the Rectangular window.) Third, for overlap-add synthesis from spectral modifications, the zero-padding allows for multiplicative modification in the frequency domain (convolutional modification in the time domain) without time aliasing in the inverse FFT. The length of the allowed convolution in the time domain (the impulse response of the effective digital filter) equals the number of extra zeros (plus one) in the zero padding.

If $ K$ is the number of samples in the main lobe when the zero-padding factor is 1 ($ N=M$), then a zero-padding factor of $ N/M$ gives $ KN/M$ samples for the same main lobe (and same main-lobe bandwidth). The zero-padding (interpolation) factor $ N/M$ should be large enough to enable accurate estimation of the true maximum of the main lobe after it has been frequency shifted by some arbitrary amount equal to the frequency of a sinusoidal component in the input signal. We have determined by computational search that, for a rectangularly windowed sinusoid (of any frequency), quadratic frequency interpolation (using the three highest bins) yields at least $ 0.1\%$ (of the distance from the sinc peak to the first zero-crossing) accuracy if the zero-padding factor $ N/M$ is 5 or higher.

Figure 3: Illustration of the first two steps of PARSHL. (a) Input data. (b) Windowed input data. (c) FFT buffer with the windowed input data. (d) Resulting magnitude spectrum.
\includegraphics[width=5in]{eps/fig3.eps}

As mentioned in the previous section, we facilitate phase detection by using a zero-phase window, i.e., the windowed data (using an odd length window) is centered about the time origin. A zero-centered, length $ M$ data frame appears in the length $ N$ FFT input buffer as shown in Fig. 3c. The first $ (M-1)/2$ samples of the windowed data, the ``negative-time'' portion, will be stored at the end of the buffer, from sample $ N-(M-1)/2$ to $ N-1$, and the remaining $ (M+1)/2$ samples, the zero- and ``positive-time'' portion, will be stored starting at the beginning of the buffer, from sample 0 to $ (M-1)/2$. Thus, all zero padding occurs in the middle of the FFT input buffer.


Next  |  Prev  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Download parshl.pdf

``PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation'', by Julius O. Smith III and Xavier Serra, Proceedings of the International Computer Music Conference (ICMC-87, Tokyo), Computer Music Association, 1987.
Copyright © 2005-12-28 by Julius O. Smith III and Xavier Serra
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [Automatic-links disclaimer]