Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search


Dudley's Channel Vocoder

The first major effort to encode speech electronically was Homer Dudley's channel vocoder (``voice coder'') [68] developed starting in October of 1928 at AT&T Bell Laboratories [245]. An overall schematic of the channel vocoder is shown in Fig.G.4.


\begin{psfrags}
% latex2html id marker 42748\psfrag{x} []{ \LARGE$ x(t)$\ }\psfrag{X0} []{ \LARGE$ x_0(t) $\ }\psfrag{X1} []{ \LARGE$ x_1(t)$\ }\psfrag{XNM1} []{ \LARGE$ x_{N-1}(t)$\ }\psfrag{xhat} []{ \LARGE$ \hat{x}(t)$\ }\psfrag{Xhat0} []{ \LARGE$ \hat{x}_0(t) $\ }\psfrag{Xhat1} []{ \LARGE$ \hat{x}_1(t)$\ }\psfrag{XhatNM1} []{ \LARGE$ \hat{x}_{N-1}(t)$\ }\begin{figure}[htbp]
\includegraphics[width=\twidth]{eps/vocoder}
\caption{Channel or phase vocoder block diagram.}
\end{figure}
\end{psfrags}

On analysis, the outputs of ten analog bandpass filters (spanning 250-3000 Hz)G.5were rectified and lowpass-filtered to obtain amplitude envelopes for each band. In parallel, the fundamental frequency $ F_0$ was measured, and a voiced/unvoiced decision was made (unvoiced segments were indicated by $ F_0=0$ . On synthesis, a ``buzz source'' (relaxation oscillator) at pitch $ F_0$ (for voiced speech) or a ``hiss source'' (for unvoiced speech) was used to drive a set of ten matching bandpass filters, whose outputs were summed to produce the reconstructed voice. While the voice quality had a quite noticeable ``unpleasant electrical accent'' [245], the bandwidth required to transmit $ F_0(t)$ and the bandpass-filter gain envelopes was much less than that required to transmit the original speech signal.

The vocoder synthesis model can be considered a source-filter model for speech which uses a nonparametric spectral model of the vocal tract given by the output of a fixed bandpass-filter-bank over time. Related efforts included the formant vocoder [190]--a type of parametric spectral model--which encoded $ F_0$ and the amplitude and center-frequency of the first three spectral formants. See [168, pp. 2452-3] for an overview and references.

The original vocoder used a ``buzz source'' (implemented using ``relaxation oscillator'') driving the filter bank during voiced speech, and a ``hiss source'' (implemented using the noise from a resistor) driving the filter bank during unvoiced speech. In later speech modeling by linear-prediction [162], the buzz source evolved to the more mathematically pure impulse train, and the hiss source became white noise.

The vocoder used an analog bandpass filter bank, and only the amplitude envelope was retained for each bandpass channel. When the vocoder was later reimplemented using the discrete Fourier transform on a digital computer (§G.7 below), it became simple to record both the instantaneous amplitude and phase for each channel. As a result, the name was updated to phase vocoder. Section G.7 summarizes the history of the phase vocoder, and §G.10 describes an example implementation using the STFT.



Subsections
Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

[How to cite this work]  [Order a printed hardcopy]  [Comment on this page via email]

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA