Spectrum Analysis of Sinusoids

Sinusoidal components are fundamental building blocks of sound. Any
sound that can be described as a ``tone'' is naturally and efficiently
modeled as a sum of windowed sinusoids over short, ``stationary'' time
segments (*e.g.*, on the order of 20 ms or more in the case of voice).
Over longer time segments, tonal sounds are modeled efficiently by
*modulated* sinusoids, where the amplitude and frequency
modulations are relatively slow. This is the model used in
*additive synthesis* (discussed further below). Of course,
thanks to Fourier's theorem, *every* sound can be expressed as a
sum of sinusoids having *fixed* amplitude and frequency, but this
is a highly inefficient model for non-tonal and changing sounds.
Perhaps more fundamentally from an audio modeling point of view, the
ear is quite sensitive to *peaks* in the short-time spectrum of a
sound, and a spectral peak is naturally modeled as a sinusoidal
component which has been shaped by some kind of ``window function'' or
``amplitude envelope'' in the time domain.

Because spectral peaks are so relatively dominant in hearing,
sinusoidal models are ``unreasonably effective'' in capturing the
tonal aspects of sound in a compact, easy-to-manipulate form.
Computing a sinusoidal model entails fitting the parameters of a
sinusoid (amplitude, frequency, and sometimes phase) to each peak in
the spectrum of each time-segment. In typical sinusoidal modeling
systems, the sinusoidal parameters are *linearly interpolated*
from one time segment to the next, and this usually provides a
perceptually smooth variation over time. (Higher order interpolation
has also been used.) Modeling sound as a superposition of modulated
sinusoids in this way is generally called *additive synthesis*
[232].

Additive synthesis is not the only sound modeling method that requires
sinusoidal parameter estimation for its calibration to desired
signals. For the same reason that additive synthesis is so effective,
we routinely calibrate *any* model for sound production by
matching the short-time Fourier transform, and in this matching
process, spectral peaks are heavily weighted (especially at low
frequencies in the audio range). Furthermore, when the model
parameters are few, as in the physical model of a known musical
instrument, the model parameters can be determined entirely by the
amplitudes and frequencies of the sinusoidal peaks. In such cases,
sinusoidal parameter estimation suffices to calibrate
non-sinusoidal models.

*Pitch detection* is another application in which spectral
peaks are ``explained'' as harmonics of some estimated fundamental
frequency. The harmonic assumption is an example of a signal modeling
constraint. Model constraints provide powerful means for imposing
prior knowledge about the source of the sound being modeled.

Another application of sinusoidal modeling is *source
separation*. In this case, spectral peaks are measured and tracked
over time, and the ones that ``move together'' are grouped together as
separate sound sources. By analyzing the different groups separately,
*polyphonic pitch detection* and even *automatic
transcription* can be addressed.

A less ambitious application related to source separation may be
called ``selected source modification.'' In this technique, spectral
peaks are grouped, as in source separation, but instead of actually
separating them, they are merely processed differently. For example,
all the peaks associated with a particular voice can be given a gain
boost. This technique can be very effective for modifying one track
in a mix--*e.g.*, making the vocals louder or softer relative to the
background music.

For purely tonal sounds, such as freely vibrating strings or the human
voice (in between consonants), forming a sinusoidal model gives the
nice side effect of *noise reduction*. For example, almost all
low-level ``hiss'' in a magnetic tape recording is left behind by a
sinusoidal model. The lack of noise between spectral peaks in a sound
is another example of a model constraint. It is a strong suppressor
of noise since the noise is entirely eliminated in between spectral
peaks. Thus, sinusoidal models can be used for *signal
restoration*.

- Spectrum of a Sinusoid
- Spectrum of Sampled Complex Sinusoid
- Spectrum of a Windowed Sinusoid
- Effect of Windowing

- Resolving Sinusoids
- Other Definitions of Main Lobe Width
- Simple Sufficient Condition for Peak Resolution
- Periodic Signals
- Tighter Bounds for Minimum Window Length
- Summary

- Sinusoidal Peak Interpolation

- Optimal Peak-Finding in the Spectrum

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

[Lecture Video] [Exercises] [Examination]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University