Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search


Inverse FFT Synthesis

When the number of partials is large, an explicit oscillator bank requires a significant amount of computation, and it becomes more efficient to use the inverse FFT to synthesize large ensembles of sinusoids [30,221,126,125,123]. This method gives the added advantage of allowing non-sinusoidal components such as filtered noise to be added in the frequency domain [224,227].

Inverse-FFT (IFFT) synthesis was apparently introduced by Hal Chamberlin in his classic 1980 book ``Musical Applications of Microprocessors'' [30]. His early method consisted of simply setting individual FFT bins to the desired amplitude and phase, so that the inverse FFT would efficiently synthesize a sum of fixed-amplitude, fixed-frequency sinusoids in the time domain.

This idea was extended by Rodet and Depalle [221] to include shaped amplitudes in the time domain. Instead of writing isolated FFT bins, they wrote entire main lobes into the buffer, where the main lobes corresponded to the desired window shape in the time domain.8.3 (Side lobes of the window transform were neglected.) They chose the triangular window ( $ \hbox{asinc}^2$ main-lobe shape), thereby implementing a linear cross-fade from one frame to the next in the time domain.

A remaining drawback of IFFT synthesis was that the inverse FFT nominally synthesizes only sinusoids at a fixed frequency, so that a rapid glissando may become ``stair-cased'' in the resynthesis, stepping once in frequency per output frame.

An extension of IFFT synthesis to support linear frequency sweeps was devised by Michael Goodwin [82]. The basic idea was to tabulate window main-lobes for a variety of sweep rates. (The phase variation across the main lobe determines the frequency variation over time, and the width of the main lobe determines its extent.) In this way, frequencies could be swept within a FFT frame instead of having to be constant with a cross-fade from one static frame to the next.

Independently, Marques and Almeida introduced chirplet modeling of speech in 1989 [149]. This technique is based on the interesting mathematical fact that the Fourier transform of a Gaussian-windowed chirp remains a Gaussian pulse in the frequency domain (see Appendix C). Instead of measuring only amplitude and phase at each a spectral peak, the parameters of a complex Gaussian are fit to each peak. The (complex) parameters of each Gaussian peak in the spectral model determine a Gaussian amplitude-envelope and a linear chirp rate in the time domain. Thus, both cross-fading and frequency sweeping are handled automatically by the spectral model.

More recent references on chirplet modeling include [177,80,81,79].

Beginning in 1999, Laroche and Dolson extended IFFT synthesis further by using raw spectral-peak regions from STFT analysis data [126,125,123]. By preserving the raw spectral peak (instead of modeling it mathematically as a window transform or complex Gaussian function), the original amplitude envelope and frequency variation are preserved for the signal component corresponding to the analyzed peak in the spectrum. To implement frequency-shifting, for example, the raw peaks (defined as ``regions of influence'' around a peak-magnitude bin) are shifted accordingly, preserving the original amplitude and phase of the FFT bins within each peak region.


Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

[How to cite this work]  [Order a printed hardcopy]

``Spectral Audio Signal Processing'', by Julius O. Smith III, (March 2007 Draft).
Copyright © 2008-05-20 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [About the Automatic Links]