Sinusoidal Modeling Systems

With the phase vocoder, the instantaneous amplitude and frequency are
normally computed only for each ``channel filter''. A consequence of
using a fixed-frequency filter bank is that the frequency of each
sinusoid is not normally allowed to vary outside the bandwidth of its
channel bandpass filter, unless one is willing to combine channel
signals in some fashion which requires extra work. Ordinarily, the
bandpass center frequencies are harmonically spaced. *I.e.*, they are
integer multiples of a base frequency. So, for example, when analyzing
a piano tone, the intrinsic progressive sharpening of its partial
overtones leads to some sinusoids falling ``in the cracks'' between
adjacent filter channels. This is not an insurmountable condition
since the adjacent bins can be combined in a straightforward manner to
provide accurate amplitude and frequency envelopes,
but it is inconvenient and outside the original scope of the phase
vocoder (which, recall, was developed originally for speech, which is
fundamentally periodic (ignoring ``jitter'') when voiced at a constant
pitch). Moreover, it is relatively unwieldy to work with the
instantaneous amplitude and frequency signals from all of the
filter-bank channels. For these reasons, the phase vocoder has
largely been effectively replaced by sinusoidal modeling in the
context of analysis for additive synthesis of inharmonic sounds,
except in constrained computational environments (such as real-time
systems). In sinusoidal modeling, the fixed, uniform filter-bank of
the vocoder is replaced by a *sparse, peak-adaptive* filter bank,
implemented by following magnitude peaks in a sequence of FFTs. The
efficiency of the split-radix, Cooley-Tukey FFT
makes it computationally feasible to implement
an enormous number of bandpass filters in a fine-grained analysis
filter bank, from which the sparse, adaptive analysis filter bank is
derived. An early paper in this area is included as Appendix H.

Thus, modern sinusoidal models can be regarded as ``pruned phase vocoders'' in that they follow only the peaks of the short-time spectrum rather than the instantaneous amplitude and frequency from every channel of a uniform filter bank. Peak-tracking in a sliding short-time Fourier transform has a long history going back at least to 1957 [210,281]. Sinusoidal modeling based on the STFT of speech was introduced by Quatieri and McAulay [221,169,222,174,191,223]. STFT sinusoidal modeling in computer music began with the development of a pruned phase vocoder for piano tones [271,246] (processing details included in Appendix H).

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University