Duda Tones with AIM


This correlogram was produced using an implementation of Roy Patterson's Stabilized Auditory Image (AIM).

This animation was produced in conjunction with Richard Duda of the Department of Electrical Engineering at San Jose State University during the Summer of 1989. Thanks to Richard Duda for both the audio examples and the explanation that follows [Duda90] The following demonstrations are shown in this clip:

1)200Hz tone

2)The first eight harmonics of a 200Hz tone.

3)The first eight harmonics are added one at a time (two per second)

4)Same as 3 but faster

5)Now the sequence is reversed. The first eight harmonics are present and then removed one at a time from the top.

6)Same as 5 but now the harmonics are removed from the bottom

7)All eight harmonics but now add vibrato to each one to make it separate out.

A finite Fourier series is a classical representation of a single periodic waveform as a mixture of component pure tones. When one listens to such synchronized mixtures, one normally hears a single, coherent sound rather than the separate harmonic components or "partials." However, there are interesting circumstances under which the individual harmonics can be heard very clearly.

One such circumstance is when the composite tone is built up or broken down sequentially. Another is when the harmonics are individually modulated, whether in amplitude, frequency or phase. A third is when there are only a few harmonics and they form a familiar musical chord. In these situations, one perceives the sound to split into separate and distinct "voices" or "streams" that can emerge from the composite tone.

Several experiments were performed to see how amplitude and frequency modulation of the harmonics affect both auditory perception and the correlograms. In the cases described below, the signals were sawtooth waves, the first eight harmonics of a 200 Hz fundamental. The following observations were made:

1. If the entire signal is presented abruptly, it sounds like a single, somewhat buzzy sound source, the component harmonics not being noticeable without concentrated effort. If the signal is built up sequentially by starting with the fundamental and rapidly bringing in the harmonics sequentially (less than 50 msec between entrances), the result still sounds like a single source but with changing tone color during the "attack." However, if the harmonics are brought in abruptly but slowly (say, 500 msec between entrances), each new harmonic sounds briefly like a new sound source. The highest harmonic tends to remain separated from the rest for a few seconds, but it eventually fuses with the lower harmonics into a single source.

2.If the harmonics are abruptly turned off one at a time, the psychological effect is quite different. It is perhaps best described as sounding like a single sound source whose tone quality or "timbre" undergoes abrupt but subtle changes. This is an example of a rather general phenomenon, in which sudden decreases in amplitude (offsets) are generally much less salient than sudden increases in amplitude (onsets).

3.If the frequency of one of the harmonics is modulated a small amount at a sub-audio rate, its sound vividly emerges as a separate "voice." A one-percent sinusoidal modulation of the frequency of any of the first 12 harmonics of a 200 Hz sawtooth, for example, produces this effect clearly, as does a one percent step change in frequency. The effect is strong and striking, indicating that the auditory system is highly sensitive to changes in frequency. (We do not know if this is because there are cells that are directly sensitive to frequency modulation, if cells with narrow spectral tuning are seeing sound come in and out of their sensitive region, or if the change in the time-domain wave shape is directly perceived.)

The existence of the separate components is also revealed in the correlograms when they are recorded on videotape and viewed dynamically in real time. Components having separate onsets or offsets are clearly visible when they jointly appear or disappear. Similarly, components having common frequency modulation stand out through their joint motion. Although no calibrated psychoacoustic measurements were made, the brief time required for an amplitude or frequency change to be seen in the correlogram seemed comparable with the time required to hear a new voice emerge. However, when the modulation was stopped, the dynamic response of the correlogram seemed much faster than the time required to hear separate components merge back into one sound stream.

These results imply that the information needed to separate harmonic components is present in the correlogram. Furthermore, the primitive nature of the signals implies that the source formation and separation mechanisms do not depend on high-level domain knowledge, but can be performed in the early stages of processing and are properly part of a model of early audition. While the adaptivity of perceptual mechanisms precludes simple explanations, the importance of common modulation implies the need for comodulation detection and grouping functions in any model of the early auditory system [Mont-Reynaud89].

Time Delay