Next  |  Prev  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search


Spectral Modeling Synthesis

One of the oldest and most successful parametric models for sound is the sinusoidal model. Conceptually, sinusoidal models are rooted in basic Fourier theory which states that any sound $ s(t)$ can be expressed mathematically as a sum of sinusoids [243,22,31,133]. For periodic sounds, the relevant sinusoids are all harmonics of a fundamental frequency $ \omega_1$:

$\displaystyle s(t) = \sum_{k=0}^K A_k \sin(\omega_k t + \phi_k) \protect$ (8.1)

where $ t$ denotes time in seconds, $ \omega_k = k\cdot 2\pi/P$ is the $ k$th harmonic radian frequency, $ P$ is the period in seconds, $ A_k$ is the amplitude of the $ k$th sinusoidal component, $ \phi_k$ is its phase, and $ K$ is the number of the highest audible harmonic.

Aperiodic sounds can similarly be expressed as a continuous sum of sinusoids at potentially all frequencies in the range of human hearing:8.1

$\displaystyle s(t) = \int_{0}^\Omega A_\omega \sin(\omega t + \phi_\omega) d\omega, \protect$ (8.2)

where $ \Omega$ denotes the upper bound of human hearing (nominally $ 2\pi\cdot 20$ kHz).

Sinusoidal models are most appropriate for ``tonal'' sounds such as spoken or sung vowels, or the sounds of musical instruments in the string, wind, brass, and ``tonal percussion'' families. Ideally, one sinusoid suffices to represent each harmonic or overtone.8.2 To represent the ``attack'' and ``decay'' of natural tones, sinusoidal components are multiplied by an amplitude envelope that varies over time. That is, the amplitude $ A_k(t)$ in (7.1) is a slowly varying function of time. Similarly, to allow pitch variations such as vibrato, the phase $ \phi_k(t)$ may be modulated in various ways. (The frequency modulation is the time-derivative of the phase modulation.)

Sinusoidal models are extremely effective. Perhaps the main reason for this is that the ear focuses most acutely on peaks in the spectrum of a sound [163,278]. For example, when there is a strong spectral peak at a particular frequency, it tends to mask lower level sound energy at nearby frequencies. As a result, the ear-brain system is, to a first approximation, a ``spectral peak analyzer''. In modern audio coders [15,184] exploiting masking has resulted in an order-of-magnitude data compression, on average, with no loss of quality, according to listening tests [24]. Thus, we may say more specifically that, to first order, the ear-brain system acts like a ``top ten percent spectral peak analyzer''.

For noise-like sounds, such as wind, scraping sounds, or breath noise in a flute, sinusoidal models are relatively expensive, requiring many sinusoids across the audio band to model noise. It is therefore helpful to combine a sinusoidal model with some kind of noise model, such as pseudo-random numbers passed through a filter [227].

Another situation in which sinusoidal models are inefficient is at sudden transients in a sound, such as the click-like onset of a percussive sound. From Fourier theory, we know that transients, too, can be modeled exactly, but only with large numbers of sinusoids at exactly the right phases and amplitudes. To obtain a more compact signal model, it is better to introduce an explicit transient model which works together with sinusoids and filtered noise to represent the sound more parsimoniously. Another advantage of an explicit transient model is that transients can be preserved during time-compression or expansion [132]. That is, when a sound is stretched (without altering its pitch), it is usually desirable to preserve the transients (i.e., to keep their local time scales unchanged) and simply translate them to new times.

In view of the foregoing remarks, a complete and efficient additive synthesis calls for sines+noise+transients, at a minimum. In addition, it is useful to superimpose spectral weightings to implement filtering directly in the frequency domain; for example, the formants of the human voice are conveniently impressed on the spectrum in this way [158]. An interesting avenue for future research is the pursuit of new spectral modeling primitives and operators which are useful for modeling important aspects of sound in the frequency domain. Henceforth, we will refer to this general topic as spectral modeling synthesis (SMS).

The subsequent sections provide a summary review of selected aspects of spectral modeling, with emphasis on applications in musical sound synthesis and effects.



Subsections
Next  |  Prev  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

[How to cite this work]  [Order a printed hardcopy]

``Spectral Audio Signal Processing'', by Julius O. Smith III, (March 2007 Draft).
Copyright © 2008-05-15 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [About the Automatic Links]