Surround Sound's Point of Origin

Chris Chafe
Summer 2018, EMPAC

all sound examples are binaural
please use headphones or earbuds

Navigate this deck with keyboard or touch

type "m" for the menu of pages and
arrow or page keys to advance or rewind

two-finger tap for the menu of pages and
the usual left/right swipe to advance or rewind

In 1990, Al Bregman published Auditory Scene Analysis and galvanized a movement to understand how primitive perceptual operations combine in parsing the sounds of the world and receiving acoustic communication. He describes the problem with an analogy...

Imagine that you are on the edge of a lake and a friend challenges you to play a game.

The game is this: Your friend digs two narrow channels up from the side of the lake. Each is a few feet long and a few inches wide and they are spaced a few feet apart. Halfway up each one, your friend stretches a handkerchief and fastens it to the sides of the channel.

As waves reach the side of the lake they travel up the channels and cause the two handkerchiefs to go into motion.

You are allowed to look only at the handkerchiefs and from their motions to answer a series of questions:

Are there two motorboats on the lake or only one?

Is the nearer one going from left to right or right to left?

Which one is closer?

Is the wind blowing?

Did something heavy fall into the water?

The lake represents the lake of air that surrounds us. The two channels are our two ear canals, and the handkerchiefs are our ear drums.


(click below) to open Chowning Turenas (1972)

(click below) to open Turenas: the realization of a dream

(synopsis of Chowning -- realization of a dream)

The culmination of two research paths, Turenas (1972), embodies both sound localization (begun in 1964) and Frequency Modulation (FM) synthesis (begun in 1967).

I was fascinated by the idea of composing music for loudspeakers and, in particular, by the idea of composing music in spaces that were compositional constructions in which sound could be positioned, animated and even moved through space—as was suggested by some electroacoustic works, especially those of Stockhausen, e.g. Gesange Der Junglinge (1956).

Moving Sound Sources and Score

Leland Smith joined me in the development of the music system in 1966. Seeing a need to structure the input process to Music IV/Music 10 programs, Smith wrote the program Score that accepted lists of data by “instrument” as input that were correlated and turned into the sequenced “score” as output.

The program to control the movement of sounds through an illusory space was largely complete by 1968 as seen in Figs. 1 & 2. I had generated a number of simple geometrical sound paths in order to evaluate the strength of the illusions. Of course, the azimuth (perceived angle) of the source was highly sensitive to the listener position within the listener space. But as long as the listener was not very close to one of the loudspeakers, the spatial distortion did not seem to destroy the illusion.

While subtle and easily covered, the distance cue seemed to me to be more robust than the azimuthal cue. The direct signal’s amplitude decreases in inverse proportion to an increase in distance.

The amount of reverberation must remain more or less constant, however, providing for a distinction between a sound whose amplitude decreases because of an increase in distance as opposed to a decrease in musical dynamic, in which case the reverberation decreases in the same proportion as does the direct signal. There is a point where the overall intensity of the reverberation is equal to the intensity of the direct signal. Beyond this point, the echo radius,[9] the direct signal is increasingly masked and thus unable to provide cues for either distance or azimuth.

Frequency Modulation (FM) Synthesis: a fortuitous diversion

FM synthesis was capable of producing a large number of differentiated timbres with only small changes in a few numbers or control functions. This attribute suggested continuous changes in timbre or metamorphoses, for which there were compelling graphic analogs especially in the designs by M. C. Escher

(click below) to open Bregman Auditory Scene Analysis (2005)

(synopsis of Bregman -- Auditory Scene Analysis)

Visual Scene Analysis became a topic in early computer vision. A process, called “scene analysis,” was critical for achieving correct descriptions of the objects.

In a spectrogram of a real-life mixture of sounds, frequency components from different sounds are overlaid in top of one another.

The dimensions of organization are Horizontal (Sequential) and Vertical (Spectral).

(Horizontal organization)

Loss of rhythmic information as a result of stream segregation.

(Horizontal organization)

Segregation of a melody from interfering tones.

(Vertical organization)

Fusion by common frequency change: Illustration 2.


Why Do We Use the Cues We Do?

For an answer, we have to turn to regularities in the “ecology” of sound: What kinds of sound are there? How does sound reach our ears? Do sounds tend to overlap in time?

Why Use Multiple Cues. Why Not the Strongest Ones Only?

The answer is this: If it did, there would be some circumstances in which we could not segregate sounds that were separate in the environment. For example, the normal spatial cues for segregation are missing when the signal comes from a single-loudspeaker radio, or around a corner, or when two sound sources are close together (e.g., a singer and a guitar) so their acoustic components all seem to come from the same location.

(ASA-related interests)

There is a strong practical reason for designing computer systems that can carry out ASA.

Since nonhuman animals also face a world in which sounds are mixed at their ears, scientists have begun to study ASA in animals.

There has also been an interest in ASA by music theorists and composers.

The research on ASA has also attracted the attention of audio engineers, because their job is to control the blending of sounds in the recording or reinforcement of musical performances.

Friday, on the river.

Sound and image.

Longer clip, develops context further.

All three people become situated in the last clip.

Illusions play with inconsistencies between cues and context.

Narrative uses cues and context, and can create shifts in “point of origin” (deixis).

Consciousness, with its "constant" properties (e.g., focus, dynamic character, point of view, and so forth) and "variable" properties (e.g., immediacy vs. displacement, factuality vs. fictionality), as well as its flow and displacement, explains much of what we do with language.

Anca M. Nemoianu, (1996 review of) Wallace Chafe, Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing Chicago and London: University of Chicago Press 1994

...what we can do with mediated narratives mirrors what we do with language….

(click below) to open Harvey Mortuous Plango

(click below) to open Patrcia Dirks' article on Mortuous

(click below) to open The Revenge

(click below) to open Alter Bahnhof

(click below) to open Blauert Communication Acoustics (2002)

(synopsis of Blauert -- Communication Acoustics)

An important class of systems in communication acoustics is audio transmission systems. These systems allow transmission of auditory scenes across time and space. In other words: They make it possible for listeners to listen to acoustical events at different locations and/or at different times than were they originally happened.

If authentic transmission is aimed at, so-called binaural systems are a good choice (systems which use artificial heads as a front end).

The research goal is to understand the perceptual and mental capabilities and processes of the listeners in the context of auditory analysis, and to model and simulate these instrumentally - by software algorithms.


There is pronounced technological demand for audio analysis - these days often called CASA (Computaional Auditory Scene Analysis).

Binaural systems have two front ports which take the signals from the left and right ear of a human or a dummy head as an input.

A 3-dimensional time-varying pattern (frequency, intensity, lateral position) results which is called a "Binaural Activity Pattern". A number of tasks within CASA can be performed based on this, so far strictly bottom-up processing.

(limitations of analyses)

Analyses of the binaural activity pattern are e.g., used for localisation and tracking of multiple sound sources in not-too-reverberant scenarios. Also decoloration and separation of concurrent sound sources succeeds quite well under these conditions - sometimes even better than humans can do.

Unfortunately, the algorithms decrease rapidly in performance with reflected sound being added in quantities which are typical for common architectural spaces.

Auditory scene analysis has thus obviously reached a point where novel algorithms are required. Further progress may, for example, need the inclusion of knowledge into the systems, particularly a-priori knowledge regarding the scenes to be analysed.

(synthesis of auditory scenes)

Synthesis of auditory scenes is currently of even higher relevance than their analysis - particularly where the listeners can act in or interact with the synthesised scenes. These interactive scenes are often called Auditory Virtual Realities (AVRs).

Applications of VR are aimed at exposing the subjects to a virtual situation such that they feel perceptively "present" in it. This is especially important if scenarios are to be created in which the subjects are supposed to act intuitively - as they would do in a respective real environment.

System components that generate the signals which are presented to the subjects via headphones are called Renderers. The most important component of the auditory renderer is the sound-field model. This is a module which creates a set of binaural impulse responses based on the geometric data of the virtual space, including the absorption characteristics of all walls and geometrical objects in the space, plus the directional characteristics of both sound source and receiver.

Binaural room simulation models have been developed which provide an auditory scenario in such a realistic way that listen- ers can no longer distinguish between a real and a virtual scenario - i.e. cannot say which one of the two is which in an A/B comparison.

Diversion back to the river to play a little with binaural cues:

ITD – interaural time difference

ILD – interaural level difference

(1) original camera mono source

(2) split sounds and bring 'YEAH' close

(3) push 'WAHOO' even farther away

(4) swap perspectives

(5) create a third listener's perspective

(6) shift to a distant perspective

(Blauert cont'd -- advanced synthesis)

The techniques used today for this purpose are no longer restricted to geometrical approximation but may include explicit solutions of the wave equation by numerical methods e.g., boundary-elements or finite elements. Diffraction and dispersion can thus properly be accounted for.

(coding, transmission, internet collaboration)

With the use of parametric coding it becomes possible that users, which actually reside in different locations, displace themselves perceptually into a common virtual room, where they may confer together (teleconferencing) or even jointly exercise a mechanical task (tele-operation).

As an entrance to virtual spaces can be provided via the internet, manifold applications can be imagined.

(multi-modal, multi-disciplinary)

As to the synthesis of auditory scenarios, it becomes evident that the VR generators become more and more multi-modal. In a close alliance with the auditory modality, tactile (incl. vibration), visual, proprioceptive information is presented - among those for other senses.

It has to be stressed at this point, however, that it would not make much sense to establish and/or maintain Communication Acoustics only as a sub-area attached to a bigger field - e.g., as a "pet" to media technologies. Communication Acoustics is too broad a field in itself.

Fuerza Imprevista (2016)

Unforeseen Strength is a collaboration between the Latin Grammy award-winners, Mariachi Flor De Toloache and Leyenda Dance Company. (Jaunt VR)

Omnitone is a JavaScript implementation of a third-order ambisonic decoder that allows you to binaurally render an ambisonic recording directly on the browser. (Google Chrome VR)


(click below) to open Fuerza Imprevista

Pearl (2016)

When I say realistic, I mean representing an immersive spatial world of sound, where there’s birds over here and there’s somebody talking over there...the sound design puts you in the scene. As it evolves it’s much more like you’re in a song, and you’re taken out of the scene and you’re allowed to sort of watch the scene rather than be in the scene. JJ Wiesler (Pollen Music Group)

The mobile storytelling project is a true R&D effort for Google’s ATAP group, employing the largest number of shots, sets and characters in any of the Spotlight Stories to date, with custom lighting, effects and interactive surround sound in every shot. Jennifer Wolfe (AWN)

(click below) to open Pearl

Tomato Music
10' soundfile piece in WFS, demo 1 (and optional binaural)

trio: Roberto Morales, Roscoe Mitchell, Chris Chafe
5'30" remix for WFS, demo 2 (and optional binaural)

"mixed music" using live acoustical instruments plus electronic sound nevers seems satisfactorily balanced

no Goldilocks loudness level

electronics are too feeble, but it's not because of loudness
turning them up they become too loud, turning down they're too soft but never "just right"

WFS-style radiation of source with multiple outputs:
synthesize thin plate physical model

compare to recording of plate

Doug James, Stefan Bilbao, Scott Van Duyne, Julius Smith methods for synthesis

(click below) to open Doug James

Stefan Bilbao:
FDTD thin plate model

Next Generation Sound Synthesis project
University of Edinburgh

FDTD (with non-linearities and air box) in stereo, Chafe experiments 2015

16 channel microphone test

(WFS , demo 3 and optional binaural playback)

16 channel brass plate recording

(WFS, demo 4 and optional binaural playback)

plate playback with channel scattering patterns

(WFS, demo 4 with scattering and optional binaural)

16 channel compressor recording with playback moved by 1 meter

(WFS, demo 5 at 0.5m distance)

(WFS, demo 5 at 1.5m distance)

(and optional binaural playback)

Real-time thin plate synth:
2D waveguide mesh into 16 WFS sources
new formulation in C++ (starting from STK version)

synth with panning changing between channel scattering and geometry intact

(WFS, demo 6 channel scattering)

(WFS, demo 6 geometry intact)

(and optional binaural playback)

plate excited by beer fermentation bubbles

(WFS, demo 6 bubbles and optional binaural playback)

plate excited by cello string "scrunch" slowed way down

(WFS, demo 7 and optional binaural playback)