In the 1960s, the phase vocoder was introduced by Flanagan and Golden
based on interpreting the classical vocoder (§H.5) filter
bank as a sliding short-time Fourier transform
[65,211]. The digital computer made
it possible for the phase vocoder to easily support phase modulation
of the synthesis oscillators as well as implementing their amplitude
envelopes. Thus, in addition to computing the instantaneous amplitude
at the output of each (complex) band-pass filter, the instantaneous
phase was also computed. (Phase could be converted to frequency by
taking a time derivative.) Complex band-pass filters were implemented
by multiplying the incoming signal by
, where
is the
th channel radian center-frequency, and
low-pass-filtering using a sixth-order Bessel filter.
The phase vocoder also relaxed the requirement of pitch-following (needed in the vocoder), because the phase modulation computed by the analysis stage automatically fine-tuned each sinusoidal component within its filter-bank channel. The main remaining requirement was that only one sinusoidal component be present in any given channel of the filter bank; otherwise, the instantaneous amplitude and frequency computations would be based on ``beating'' waveforms instead of single sinusoids which produce smooth amplitude and frequency envelopes necessary for good data compression.
The phase vocoder extends the vocoder to include the starting phase of each filter-bank channel output signal. (After time zero, the phase in each channel is given by the starting phase plus the integral of the instantaneous frequency in that channel.) Unlike the hardware implementations of the channel vocoder, the phase vocoder is typically implemented in software on top of a Short-Time Fourier Transform (STFT), and it used additive synthesis for reconstructing the signal from its amplitude and ``phase derivative'' (instantaneous frequency) spectrum [65]. Time scale modification and frequency shifting were early applications of the phase vocoder [65].
The phase vocoder can also be considered an early subband coder [247]. Since the mid-1970s, subband coders have typically been implemented using the STFT [65,187,10]. In the field of perceptual audio compression, additional compression has been obtained using undersampled filter banks that provide aliasing cancellation [249], the first example being the Princen-Bradley filter bank [189].
The phase vocoder was also adopted as the analysis framework of choice for additive synthesis (sinusoidal modeling) in computer music [162]. (See §H.8 for more about additive synthesis.)
Today, the term ``vocoder'' has become somewhat synonymous in the audio research world with ``modified short-time Fourier transform'' [187,53]. In the commercial musical instrument world, it implies a keyboard instrument with a microphone that performs cross synthesis (§9.3).