In the analysis phase, sinusoidal peaks are measured over time in a
sequence of FFTs, and these peaks are grouped into tracks across time.
If the time advance from one FFT to the next is fixed (5ms is a
typical choice for speech analysis), then we obtain uniformly
sampled amplitude and frequency trajectories as the result of the
analysis. The sampling rate of these amplitude and frequency
envelopes is equal to the frame rate of the analysis. (If the
time advance between FFTs is
ms, then the frame rate is
defined as
Hz.) For resynthesis using inverse
FFTs, these data may be used unmodified. For resynthesis using a bank
of sinusoidal oscillators, on the other hand, we must somehow
interpolate the envelopes to create envelopes at the signal
sampling rate (typically
kHz or higher).
It is typical in computer music to linearly interpolate the
amplitude and frequency trajectories from one frame to the next
[248]. Higher order interpolations of so-called
envelope break-points were also developed at CCRMA in the late
1970s (e.g., using cubic splines), but for tonal sounds, linearly
interpolation is usually sufficient, and the higher-order envelopes
did not see much use, presumably due to the greater complexity of
dealing with them coupled with the lack of significant benefit. Let's
call the piecewise linear upsampled envelopes
and
,
defined now for all
at the normal signal sampling rate. For
steady-state tonal sounds, the phase may be discarded at this stage
and redefined as the integral of the instantaneous frequency when
needed: