The Phase Vocoder is an algorithm for timescale modification of audio. One way of understanding it is to think of it as stretching or compressing the time-base of a spectrogram to change the temporal characteristics of a sound while retaining its short-time spectral characteristics; if the spectrogram is narrowband (analysis window longer than a pitch cycle, so the individual harmonics are resolved), then preserving the spectral characteristics implies preserving the pitch, and avoiding the 'slowing down the tape' pitch drop. The only complication to the algorithm is that the phases associated with each bin in the modified spectrogram image have to be 'fixed up' to maintain the dphase/dtime of the original, thereby ensuring the correct alignment of successive windows in the overlap-add reconstruction.
This implementation first calculates the
short-time Fourier transform of the signal using 'stft'; 'pvsample' then
builds a modified spectrogram array by sampling the original array at a
sequence of fractional time values, interpolating the magnitudes and fixing-up
the phases as it goes along. The resulting time-frequency array can
be inverted back into a sound with 'istft'. The 'pvoc' script is a
wrapper to perform all three of these steps for a fixed time-scaling factor
(larger than one for speeding up; smaller than one to slow down). But
the underlying pvsample routine would also support arbitrary timebase variation
(freezing, reversal, modulation) if one wished to write a suitable interface
to specify the time path.
I analyzed the Matlab Phase
Vocoder by dan Ellis to learn and understand a possible implemenation of
it, in addition to the theory from the Music 420 reader.
Here's an example of how to use pvoc to slow down a soundfile of voice (sampled at 16 kHz) to 3/4 speed:
»[d,sr]=wavread('voice.wav');
»y=pvoc(d,.25,1024);
»% Compare original and resynthesis
»sound(d,sr)
»sound(y,sr)