The PARSHL Program

This appendix is adapted from the original paper describing the PARSHL program [#!SmithSerra!#] for sinusoidal modeling of audio. While many of the main points are summarized elsewhere in the text, the PARSHL paper is included here as a source of more detailed info on carrying out elementary sinusoidal modeling of sound based on the STFT.

As mentioned in §G.7.1, the phase vocoder was a widely used
analysis tool for additive synthesis starting in the 1970s. A
difficulty with the phase vocoder, as traditionally implemented, is
that it uses a *fixed* uniform filter bank. While this works
well for periodic signals, it is relatively inconvenient for
inharmonic signals. An ``inharmonic phase vocoder'' called PARSHL^{H.1} was developed in 1985 to address this problem in the
context of piano signal modeling [#!SmithSerra!#]. PARSHL
worked by tracking peaks in the short-time Fourier transform (STFT),
thereby synthesizing an adaptive inharmonic FIR filter bank, replacing
the fixed uniform filter bank of the vocoder. In other respects,
PARSHL could be regarded as a phase-vocoder analysis program.

The PARSHL program converted an STFT to a set of amplitude and frequency envelopes for inharmonic, quasi-sinusoidal-sum signals. Only the most prominent peaks in the spectrum of the input signal were tracked. For quasi harmonic sounds, such as the piano, the amplitudes and frequencies were sampled approximately once per period of the lowest frequency in the analysis band. For resynthesis, PARSHL supported both additive synthesis [#!RissetAndMathews69!#] using an oscillator bank and overlap-add reconstruction from the STFT, or both.

PARSHL followed the amplitude, frequency, and phase^{H.2} of the most
prominent peaks over time in a series of spectra, computed using the
Fast Fourier Transform (FFT) The synthesis part of the program used
the analysis parameters, or their modification, to generate a sinewave
in the output for each peak track found.

The steps carried out by PARSHL were as follows:

- Compute the STFT
using the frame size, window
type, FFT size, and hop size specified by the user.
- Compute the squared magnitude spectrum in dB
(
).
- Find the bin numbers (frequency samples) of the spectral peaks.
Parabolic interpolation is used to refine the peak location
estimates. Three spectral samples (in dB) consisting of the local peak
in the FFT and the samples on either side of it suffice to determine
the parabola used.
- The magnitude and phase of each peak is calculated from the
maximum of the parabola determined in the previous step. The parabola
is evaluated separately on the real and imaginary parts of the
spectrum to provide a complex interpolated spectrum value.
- Each peak is assigned to a frequency track by matching the
peaks of the previous frame with the current one. These tracks can be
``started up,'' ``turned-off'' or ``turned-on'' at any frame by
ramping in amplitude from or toward
**0**. - Arbitrary modifications can be applied to the analysis
parameters before resynthesis.
- If additive synthesis is requested, a sinewave is generated for
each frequency track, and all are summed into an output buffer. The
instantaneous amplitude, frequency, and phase for each sinewave are
calculated by interpolating the values from frame to frame. The
length of the output buffer is equal to the hop size
which is
typically some fraction of the window length
.
- Repeat from step 1, advancing
samples each iteration until
the end of the input sound is reached.

The following sections provide further details:

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University