This section describes further details of the sines+noise+transients system of Scott Levine [127], which can be considered a relatively recent contribution in sinusoidal modeling systems.
Figure 9.20 shows the time-frequency map used in the S+N+T system of Scott Levine [127]. Vertical line spacing in the time-frequency map indicates the time resolution of the underlying multiresolution STFT, and the horizontal line spacing indicates its frequency resolution. The time waveform appears below the time-frequency map. For transients, an interval of data including the transient is simply encoded using MPEG-2 AAC. The transient-time in Fig.9.20 extends from approximately 47 to 115 ms. (This interval can be tighter, as discussed further below.) Between transients, the signal model consists of sines+noise below 5 kHz and amplitude-modulated noise above. The spectrum from 0 to 5 kHz is divided into three octaves (``multiresolution sinusoidal modeling''). The time step-size varies from 25 ms in the low-frequency band (where the frequency resolution is highest), down to 6 ms in the third octave (where frequency resolution is four times lower). In the 0-5 kHz band, sines+noise modeling is carried out. Above 5 kHz, noise substition is performed, as discussed further below.
Figure 9.21 shows a similar frequency map in which the transient interval depends on frequency. This enables a tighter interval enclosing the transient, and follows audio perception more closely (see Appendix F).
Figure 9.22 illustrates the nature of the noise modeling used. The energy in each Bark band10.7 is summed, and this is used as the gain for the noise in that band at that frame time.
Figure 9.23 shows the frame gain versus time for a particular Bark band (top) and the piecewise linear envelope made from it (bottom). As illustrated in Figures 9.20 and 9.21, the step size for all of the Bark bands above 5 kHz is approximately 3 ms.
For more information on this sines+noise+transient system, see Scott Levine's CCRMA PhD/EE thesis [127].