One of the oldest and most successful parametric models for sound is
the sinusoidal model. Conceptually, sinusoidal models are
rooted in basic Fourier theory which states that any sound
can be expressed mathematically as a sum of sinusoids
[243,22,31,133]. For periodic
sounds, the relevant sinusoids are all harmonics of a
fundamental frequency
:
Aperiodic sounds can similarly be expressed as a continuous sum of sinusoids at potentially all frequencies in the range of human hearing:8.1
Sinusoidal models are most appropriate for ``tonal'' sounds such as
spoken or sung vowels, or the sounds of musical instruments in the
string, wind, brass, and ``tonal percussion'' families. Ideally, one
sinusoid suffices to represent each harmonic or overtone.8.2 To represent the ``attack'' and ``decay'' of
natural tones, sinusoidal components are multiplied by an
amplitude envelope that varies over time. That is, the
amplitude
in (7.1) is a slowly varying function of
time. Similarly, to allow pitch variations such as vibrato, the phase
may be modulated in various ways. (The frequency
modulation is the time-derivative of the phase modulation.)
Sinusoidal models are extremely effective. Perhaps the main reason for this is that the ear focuses most acutely on peaks in the spectrum of a sound [163,278]. For example, when there is a strong spectral peak at a particular frequency, it tends to mask lower level sound energy at nearby frequencies. As a result, the ear-brain system is, to a first approximation, a ``spectral peak analyzer''. In modern audio coders [15,184] exploiting masking has resulted in an order-of-magnitude data compression, on average, with no loss of quality, according to listening tests [24]. Thus, we may say more specifically that, to first order, the ear-brain system acts like a ``top ten percent spectral peak analyzer''.
For noise-like sounds, such as wind, scraping sounds, or breath noise in a flute, sinusoidal models are relatively expensive, requiring many sinusoids across the audio band to model noise. It is therefore helpful to combine a sinusoidal model with some kind of noise model, such as pseudo-random numbers passed through a filter [227].
Another situation in which sinusoidal models are inefficient is at sudden transients in a sound, such as the click-like onset of a percussive sound. From Fourier theory, we know that transients, too, can be modeled exactly, but only with large numbers of sinusoids at exactly the right phases and amplitudes. To obtain a more compact signal model, it is better to introduce an explicit transient model which works together with sinusoids and filtered noise to represent the sound more parsimoniously. Another advantage of an explicit transient model is that transients can be preserved during time-compression or expansion [132]. That is, when a sound is stretched (without altering its pitch), it is usually desirable to preserve the transients (i.e., to keep their local time scales unchanged) and simply translate them to new times.
In view of the foregoing remarks, a complete and efficient additive synthesis calls for sines+noise+transients, at a minimum. In addition, it is useful to superimpose spectral weightings to implement filtering directly in the frequency domain; for example, the formants of the human voice are conveniently impressed on the spectrum in this way [158]. An interesting avenue for future research is the pursuit of new spectral modeling primitives and operators which are useful for modeling important aspects of sound in the frequency domain. Henceforth, we will refer to this general topic as spectral modeling synthesis (SMS).
The subsequent sections provide a summary review of selected aspects of spectral modeling, with emphasis on applications in musical sound synthesis and effects.