A sines+noise analysis diagram is shown in Fig.7.7. The processing path along the top from left to right measures the amplitude and frequency trajectories from magnitude peaks in the STFT, as is done in typical sinusoidal modeling frameworks (based on quadratically interpolated peaks). The peak amplitude and frequency trajectories are converted back to the time domain by additive synthesis (an oscillator bank or inverse FFT), and this signal is windowed by the same analysis window and forward-transformed back into the frequency domain. The magnitude-spectrum of this sines-only data is then subtracted from the originally computed magnitude-spectrum containing both peaks and ``noise''. The result of this subtraction is termed the residual signal. The upper spectral envelope of the residual magnitude spectrum is measured using, e.g., linear prediction.
Note that instead of going back to the time domain via additive synthesis, windowing, retransforming, and computing the ``sines-only'' magnitude spectrum, each peak amplitude and frequency can be used to scale and translate a copy of the window-transform magnitude spectrum. The sum of these window transforms, one per peak, provides approximately the same sines-only magnitude spectrum, albeit without the linear-interpolation from frame to frame, which serves to spread the window transforms somewhat. When inverse-FFT synthesis is used, this simplified procedure becomes exact--all sinusoids are constrained to have constant frequency during one inverse-FFT frame (in the basic implementation of IFFT synthesis--see §7.6.2 and §C.5 for more advanced alternatives).
A sines+noise synthesis diagram is shown in Fig.7.8.
The peak amplitude and frequency trajectories are possibly subjected
to modifications (time-scale modification, frequency shifting, virtual
formants, etc.) and then rendered into the time domain by additive
synthesis. This is termed the deterministic part of the
synthesized signal. The stochastic part is synthesized by (in
principle) applying the residual-spectrum envelope to white noise,
again with possible modifications applied to the spectral envelope for
the noise component. In the frequency domain, this can be carried out
by simply randomizing the phase of the spectral envelope (which
functions as a filter amplitude response). That is, the filter
spectral envelope, which is stored as a real function, is given a
random phase in the interval
at each sample (bin). This
complex result is equivalent to having multiplied the real spectral
envelope times a complex spectrum consisting of unity magnitude and
random phase (white noise). In Fig.7.8, the deterministic and
stochastic components are summed after transforming to the time
domain, and this is the typical choice when an explicit oscillator
bank is used for the additive synthesis. When inverse-FFT synthesis
is used, the sum can occur in the frequency domain, so that only one
inverse FFT is required.
For more information on sines+noise signal modeling, see Xavier Serra's CCRMA PhD thesis [224], the SMS Home Page (listen to the examples!), and related publications [226], [11,132].