It often happens that the model which is most natural from a conceptual (and manipulative) point of view is also the most effective from a compression point of view. This is because, in the ``right'' signal model for a natural sound, the model's parameters tend to vary quite slowly compared with the audio rate. As an example, physical models of the human voice and musical instruments have led to expressive synthesis algorithms which can also represent high-quality sound at much lower bit rates (such as MIDI event rates) than normally obtained by encoding the sound directly [42,237,241,138].
The sines+noise+transients spectral model follows a natural perceptual decomposition of sound into three qualitatively different components: ``tones'', ``noises'', and ``attacks''. This compact representation for sound is useful for both musical manipulations and data compression. It has been used, for example, to create an audio compression format comparable in quality to MPEG-AAC [23,24,15] (at 32 kbits/s), yet it can be time-scaled or frequency-shifted without introducing objectionable artifacts [132].
Sinusoidal models automatically support time-scale modification (and frequency-shifting, its Fourier dual), because the original signal is replaced by oscillator amplitude and frequency envelopes which are easily time-scaled without causing unnatural artifacts. When amplitude or frequency envelopes are rescaled in time, the oscillators are allowed to run continuously under them, thereby avoiding artifacts. Similarly, in sines-plus-noise synthesis, the time-varying noise-filter is a time-frequency envelope that can be smoothly rescaled along either dimension.