Using the phase-vocoder to compute amplitude and frequency envelopes
for additive synthesis works best for quasi-periodic signals. For
inharmonic signals, the vocoder analysis method can be unwieldy: The
restriction of one sinusoid per subband leads to many ``empty'' bands
(since radix-2 FFT filter banks are always uniformly spaced). As a
result, we have to compute many more filter bands than are actually
needed, and the empty bands need to be ``pruned'' in some way (*e.g.*,
based on an energy detector within each band). The unwieldiness of a
uniform filter bank for tracking inharmonic partial overtones through
time led to the development of sinusoidal modeling based on the STFT,
as described in §G.11.2 below.

Another limitation of the phase-vocoder analysis was that it did not
capture the attack transient very well in the amplitude and frequency
envelopes computed. This is because an attack transient typically
only partially filled an STFT analysis window. Moreover, filter-bank
amplitude and frequency envelopes provide an inefficient model for
signals that are *noise*-like, such as a flute with a breathy
attack. These limitations are addressed by sinusoidal modeling,
sines+noise modeling, and sines+noise+transients modeling, as
discussed starting in §10.4 below (as well as in §10.4).

The phase vocoder was not typically implemented as an *identity
system* due mainly to the large data reduction of the envelopes
(piecewise linear approximation). However, it *could* be used as
an identity system by keeping the envelopes at the full signal
sampling rate and retaining the initial
*phase* information for each channel. Instantaneous phase is
then reconstructed as the initial phase plus the time-integral of the
instantaneous frequency (given by the frequency envelope).

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University