Next  |  Prev  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Spectrum Analysis of Sinusoids

Sinusoidal components are fundamental building blocks of sound. Any sound that can be described as a ``tone'' is naturally and efficiently modeled as a sum of windowed sinusoids over short, ``stationary'' time segments (e.g., on the order of 20 ms or more in the case of voice). Over longer time segments, tonal sounds are modeled efficiently by modulated sinusoids, where the amplitude and frequency modulations are relatively slow. This is the model used in additive synthesis (discussed further below). Of course, thanks to Fourier's theorem, every sound can be expressed as a sum of sinusoids having fixed amplitude and frequency, but this is a highly inefficient model for non-tonal and changing sounds. Perhaps more fundamentally from an audio modeling point of view, the ear is quite sensitive to peaks in the short-time spectrum of a sound, and a spectral peak is naturally modeled as a sinusoidal component which has been shaped by some kind of ``window function'' or ``amplitude envelope'' in the time domain.

Because spectral peaks are so relatively dominant in hearing, sinusoidal models are ``unreasonably effective'' in capturing the tonal aspects of sound in a compact, easy-to-manipulate form. Computing a sinusoidal model entails fitting the parameters of a sinusoid (amplitude, frequency, and sometimes phase) to each peak in the spectrum of each time-segment. In typical sinusoidal modeling systems, the sinusoidal parameters are linearly interpolated from one time segment to the next, and this usually provides a perceptually smooth variation over time. (Higher order interpolation has also been used.) Modeling sound as a superposition of modulated sinusoids in this way is generally called additive synthesis [233].

Additive synthesis is not the only sound modeling method that requires sinusoidal parameter estimation for its calibration to desired signals. For the same reason that additive synthesis is so effective, we routinely calibrate any model for sound production by matching the short-time Fourier transform, and in this matching process, spectral peaks are heavily weighted (especially at low frequencies in the audio range). Furthermore, when the model parameters are few, as in the physical model of a known musical instrument, the model parameters can be determined entirely by the amplitudes and frequencies of the sinusoidal peaks. In such cases, sinusoidal parameter estimation suffices to calibrate non-sinusoidal models.

Pitch detection is another application in which spectral peaks are ``explained'' as harmonics of some estimated fundamental frequency. The harmonic assumption is an example of a signal modeling constraint. Model constraints provide powerful means for imposing prior knowledge about the source of the sound being modeled.

Another application of sinusoidal modeling is source separation. In this case, spectral peaks are measured and tracked over time, and the ones that ``move together'' are grouped together as separate sound sources. By analyzing the different groups separately, polyphonic pitch detection and even automatic transcription can be addressed.

A less ambitious application related to source separation may be called ``selected source modification.'' In this technique, spectral peaks are grouped, as in source separation, but instead of actually separating them, they are merely processed differently. For example, all the peaks associated with a particular voice can be given a gain boost. This technique can be very effective for modifying one track in a mix--e.g., making the vocals louder or softer relative to the background music.

For purely tonal sounds, such as freely vibrating strings or the human voice (in between consonants), forming a sinusoidal model gives the nice side effect of noise reduction. For example, almost all low-level ``hiss'' in a magnetic tape recording is left behind by a sinusoidal model. The lack of noise between spectral peaks in a sound is another example of a model constraint. It is a strong suppressor of noise since the noise is entirely eliminated in between spectral peaks. Thus, sinusoidal models can be used for signal restoration.

Next  |  Prev  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

[How to cite this work]  [Order a printed hardcopy]  [Comment on this page via email]
[Watch the Video]  [Work some Exercises]  [Examination]  
``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2015-02-01 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University