...loading... (could be 30" or more if busy network)

Navigate this deck with keyboard or touch

type "m" for the menu of pages and
arrow or page keys to advance or rewind

two-finger tap for the menu of pages and
the usual left/right swipe to advance or rewind

embedded media clips (audio and video) use an extra click (or swipe) to begin playing
and another to finish

"Close Your Eyes and Imagine What You Want to Hear"

Research, Craft, and Reality in
Creating Spatial Audio Environments

(1) original mono source

(from a camera video, repeated)

put on your headphones

take it from here

(1) original mono source

(from a camera video, repeated)

(2) move you close to 'YEAH' voice

(3) push 'WAHOO' voice farther away

(4) move you very near to 'WAHOO', far from 'YEAH'

(5) move you to an off-center perspective

(6) move you to a more distant perspective

(1) original mono source

(from a camera video, repeated)

(2) move you close to 'YEAH' voice

(3) push 'WAHOO' voice farther away

(4) move you very near to 'WAHOO', far from 'YEAH'

(5) move you to an off-center perspective

(6) move you to a more distant perspective

take off your headphones

nod your head when you get here



Deixis designates referring expressions such as
  • this / that
  • here / there
  • now / then
  • I / we /you
joined, where appropriate, to bodily postures, gestures and gaze.

William F. Hanks, Explorations in the Deictic Field, in Current Anthropolgy (46):2 April, 2005 191--220

"The central insight here is that the meaning of any individual item derives from its contrast with other items in the same domain."


"...joined, where appropriate, to bodily postures, gestures and gaze."
  • presumably gaze includes hearing
  • we know that vision and sound integrate strongly, for example, loudspeakers not near a video screen or source will "jump" to the visual source
  • proprioception and sound integrate strongly, too: tilt your head, does my voice move above your head?

But it is also based on (inter)subjective context, understood in terms of speakers’ perception, attention focus, bodily orientation, and gestures. From this viewpoint, the basic function of deixis in any language is to orient the subjective attention of the interactants, who are, in turn, presumed to be in “the natural attitude,” that is, wide awake, with a sense of their own bodies, integrating sensory data from vision, hearing, and touch. Hence deixis provides a basic system of coordinates, and to explain the meaning of an utterance such as “There goes Jack” we must give an account of the semantics of the expressions plus the orienting function of the actual utterance in situ.
(adds hearing)

In 1990, Al Bregman published Auditory Scene Analysis and galvanized a movement to understand how primitive perceptual operations combine in parsing the sounds of the world and receiving acoustic communication.

He describes the problem with an analogy...


Imagine you are on the edge of a lake and a friend challenges you to play a game.

The game is: Your friend digs two narrow channels up from the side of the lake. Each is a few feet long and a few inches wide and they are spaced a few feet apart. Halfway up each one, your friend stretches a handkerchief and fastens it to the sides of the channel.

As waves reach the side of the lake they travel up to the channels and cause the two handkerchiefs to go into motion.

You are allowed to look only at the handkerchiefs and from their motions to answer a series of queistions:
  • How many boats are there on the lake and where are they?
  • Which is the most powerful one?
  • Which one is closer?
  • Is the nearer one going from left to right or right to left?
  • Is the wind blowing?
  • Has any large object been dropped suddenly into the lake?
Solving this problem seems impossible, but it is a strict analogy to the problem faced by our auditory system.

The lake represents the lake of air that surrounds us. The two channels are our two ear canals, and the handkerchiefs are our ear drums.

put on headphones

play through the following slides and stop after the kayak video is repeated



Auditory Scene Analysis
  • Visual Scene Analysis became a topic in early computer vision. A process, called “scene analysis,” was critical for achieving correct descriptions of the objects.
  • In a spectrogram of a real-life mixture of sounds, frequency components from different sounds are overlaid in top of one another.
  • The dimensions of organization are Horizontal (Sequential) and Vertical (Spectral).

(Horizontal organization)

Loss of rhythmic information as a result of stream segregation.

(Horizontal organization)

Segregation of a melody from interfering tones.

(Vertical organization)

Fusion by common frequency change: Illustration 2.


Why do we use the cues we do?
  • For an answer, we have to turn to regularities in the “ecology” of sound:
  • What kinds of sound are there?
  • How does sound reach our ears?
  • Do sounds tend to overlap in time?

Why Use Multiple Cues, Why Not the Strongest Ones Only?
  • The answer is this:
  • If it did, there would be some circumstances in which we could not segregate sounds that were separate in the environment. For example, the normal spatial cues for segregation are missing when the signal comes from a single-loudspeaker radio, or around a corner, or when two sound sources are close together (e.g., a singer and a guitar) so their acoustic components all seem to come from the same location.

All three people become situated in the last clip.

Illusions play with inconsistencies between cues and context.

Narrative uses cues and context, and can create shifts in point of origin and relation between objects in the scene.


"Buhler applied the theory of deixis to narratives. He proposed ...deictic field, which operates in three modes:"
  • the here-and-now of the speaker's sensible environment
  • the context of the discourse itself considered as a structured environment
  • the context "of imagination and long-term memory"
Buhler's model attempts "to describe the psychological and physical process whereby the live deictic field of our own bodily orientation and experience" is "transposed into an imaginative construction."

Mary Galbraith, Deictic Shift Theory and the Poetics of Involvement in Narrative, in Judith Duchan, Gail Bruder, Lynne Hewitt, eds. Deixis in Narrative: A Cognitive Science Perspective, 1995


If there is always a fictional speaker whose voice is heard by the reader as he or she reads, then a particular kind of aloneness can never be represented in a narrative. Virginia Woolf, in particular, wrote often about an aloneness for which there is no one to speak, and I think this effect is much more strongly conveyed when the text is not conceptualized as being relayed by a fictional speaker. Consider, for example, the following passage from the "Time Passes" section of To the Lighthouse (Woolf, 1955):

So with the house empty and the doors locked and the mattresses rolled round, those stray airs, advance guards of great armies, blustered in, brushed bare boards, nibbled and fanned, met nothing in bedroom or drawing-room that wholly resisted them but only hangings that flapped, wood that creaked, the bare legs of tables, saucepans and china already furred, tarnished, cracked. (p. 194)

In this passage, I think it is important that there is not only no one who sees what is depicted here, but also that no one speaks. The fictional subjectivity of the passage belongs only to the airs.

Chris will now stop talking.

A different voice will be used as we continue the presentation.

It is the sound of your inner voice which is what you are hearing as you read these words.

How would you rate the clarity of the imagined sound?
  • Perfect
  • Clear
  • Moderate
  • Vague
  • No image
adopted from Vividness of Visual Imagery Questionnaire (VVIQ Scale)

D. Marks, "Visual Imagery Differences in the Recall of Pictures" Brit. J. Psych (1973)

Is it located inside your head?

How 100 Mechanical Turk workers rated clarity...

...and location inside head.

Apparently, this is not harmful to Mechanical Turk workers
  • Definitely a different type survey. It was fun. Thanks for being creative!
  • This was a weird but interesting hit. I had never thought about this aspect of hearing or inner voices. I will work with it a bit and see what happens. I realized while doing it than depending on where my focus is, I can hear the sound/voice in different manners- in my head- my voice or a different voice, outside my head like someone else is speaking, in my head like someone else is speaking, in my head with the voice I usually hear speaking to me- whatever I think I can make physically happen.
  • i did my best, but some of the questions were not entirely clear to me.
  • As I began the task, I was able to imagine hearing my voice. Then as the task went on, it became more difficult to hear anything, because there was no sound.
  • This was interesting. I have very good hearing and often hear things other people don't. However, I wasn't very good at imagining a friend's voice or a sound coming from outside my head.
  • This was an unusual but very interesting study. It is almost like separating body and soul.
  • It's a fascinating topic and I found it a little difficult to describe in words, so having checkboxes was really helpful.

  • Is there any way to find out what this study is for? This was interesting to do.
  • good study, very intresting
  • This was a very interesting hit. It is one that really makes you listen and concentrate,thanks.
  • It was very interesting to take this survey!
  • This was an interesting task. I've never been asked to do anything quite like it before. This was fun to complete.
  • I found it interesting to not be able to imagine my voice coming from a spot outside myself or any voice saying mechanical turk including that of my mother. I could hear her voice in my mind but could not imagine or hear her saying mechanical turk. Odd and I wonder why that is.
  • very interesting study... a subject I've never thought about but makes complete sense
  • interesting

How strong is imaginary loundess?

Is there a way to synthetically play to the ears what something sounds like that is imagined sound?

...please put headphones back on

2014 study with 50 Mechanical Turk workers involved perceived and imagined loudness (quasi-loudness)

1) play back a voice recorded at reference level: "Amazon, Amazon..."
2) compare levels of a "mystery" test sound to that of the voice

Loudness judgements of these "mystery" test sounds that were played in the browser compared to loudness of the pre-recorded voice saying "Amazon"
(sum of judgements: -1 = quieter, 0 = same, 1 = louder)

1) attend to your inner voice reading: "Mechanical Turk, Mechanical Turk..."
2) remember "mystery" test sound levels and compare to your inner voice

Compared to perceived scale, the imaginary (quasi-) loudness scale is compressed and shifted.

1) again, read: "Mechanical Turk, Mechanical Turk..."
2) imagine a "mystery" test sound at the same level
3) play the "mystery" test sounds in the browser and select the one closest to the imagined

"Mystery" test sound at same level as "Mechanical Turk, Mechanical Turk..." corresponds to which of the five perceived loudnesses?

future composition project: hear "inner hearing" -- synthesize sounds as if they were imaginary by mapping loudness and including other contrasts (spatial, etc.)


Here an enigma of the auditory field emerges from these two dimensions of field spatiality; both the global, encompassing surroundability of sound, which is most dramatic and fully present in overwhelming sounds, and the often quite precise and definite directionality of sound presence, which is noted in our daily “location” of sounds, are constantly copresent. For the description to be accurate, both surroundability and directionality must be noted as copresent. This “double” dimensionality of auditory field characteristics is at once the source of much ambiguity and of a specific richness that subtly pervades the auditory dimension of existence.

Don Ihde, Listening and Voice Phenomenologies of Sound, 2007


For when I listen to music I also face the orchestra, and the richness of its aura is such that while facing the orchestra the plenum of sound is full and penetrating. But, as noted above, when I begin to engage the movements of my body that I ordinarily use to locate directions and do so extremely enough, I can suddenly discover the echo from the back of the auditorium which vividly disrupts the previously full “halo” of the music.

In the overwhelming presence of music that fills space and penetrates my awareness, not only am I momentarily taken out of myself in what is often described as a loss of self-awareness that is akin to ecstatic states, but there is a distance from things. The purity of music in its ecstatic surrounding presence overwhelms my ordinary connection with things so that I do not even primarily hear the symphony as the sounds of the instruments.

(click below) to open https://hagiasophia.stanford.edu/

(click below) to open https://ccrma.stanford.edu/groups/chavin/context.html

Longyou Grottoes

Physical Models
  • The 2D Waveguide Mesh computes realistic-sounding simulations of struck plates. The early 90’s saw an initial flurry of development on the model but it hasn’t yet found the greater musical use that its interesting sonic qualities suggest are possible.
  • The overall project goal is a synthesis instrument for real-time music performance and studio composition.

Adding edge filters and reflections, Synthesis Toolkit (12x12 nodes)

Enhancements which will be described (32x8 nodes)

Swinging in space simulated by wavefield synthesis

  • Geometry scaled to speed of sound in metal
  • Multi-tap outputs and wavefield synthesis
  • Mode stretching and passive nonlinearity (allpass filters at edge nodes)
  • Damping at internal nodes
  • Rectangular interpolation (input, output and damping nodes)
  • Excitation "contact sound" mixed into output

Brass plate in front of 15-channel microphone array

Laser doppler vibrometer

15-channel microphone array, 3 msec (channel 8 muted)

Laser doppler vibrometer

overlay of mic and LDV

speed of sound in brass plate and in air

Testing 15-channel microphone array with an impulse generator

Impulse response across mic (channel 8 muted)

Channels independently normalized (speed of sound in air visible)

Same thing for 15-channel wavefield synthesis (impulse response across array outputs)

Channels independently normalized (speed of sound in air visible)

Impulse response of mesh with 12 output taps

Mesh output (12 channels) fed through WFS array (15 outputs)

Excitation (force hammer recording) added through WFS array

Creates the two wavefronts

Network Rooms

Demo with three rooms interconnected

Standard two-way connection

N-way connections


Freeverb stereo version

Freeverb using network delays


This deck available online at

(click below) to open http://chrischafe.net/

Both these qualities of sound are used simultaneously in what is a most normal human activity, face-to-face speech. The other speaks to me in the “singing” of the human voice with its consonantal clicklike sounds and its vowel tonalities. It is a singing that is both directional and encompassing, such that I may be (auditorily and attentionally) immersed in the other’s presence. Yet the other stands before me. Speech in the human voice is between the dramatic surroundability of music and the precise directionality of the sounds of the things in the environment. It is in this range of variable presence and focus that the distance between musical experience, often taken as an exceptional experience, and the experience of sounds as primarily the sounds of things that are "located" in a place appears. The seductivity of a "pure sensuosity" in Mozart’s music described by Kierkegaard finds support, but with a different ground here. 1 In the overwhelming presence of music that fills space and penetrates my awareness, not only am I momentarily taken out of myself in what is often described as a loss of self-awareness that is akin to ecstatic states, but there is a distance from things. The purity of music in its ecstatic surrounding presence overwhelms my ordinary connection with things so that I do not even primarily hear the symphony as the sounds of the instruments. But the flight of music into ecstasy is quickly lost if the instrument intrudes as in the case of having to listen to the beginner whose violin squeaks and squawks instead of sounding in its own smooth tonality.

(social field = compositional or improvisational relations)
What is needed, instead, is a way of describing how the positions that make up any deictic field are configured according to the social field and what relationship these positions bear to language at the levels of situated utterances, deictic types, and whole deictic systems. We need to know how interactants take up those positions and occupy and vacate them in ordinary practice and how the field varies under social embedding (including different discourse genres [Hanks 1987]). These questions have important consequences for research methodology and for the description of specific languages and societies.

We need a different idea of space, a better theory of how it is integrated with nonspatial aspects of context, and a more thorough treatment of the social embedding of the deictic field.
(language / discourse genres = music / music genres)

The selection and understanding of deictics relies on the simultaneous articulation of space, perception, discourse, commonsense and mutual knowledge, anticipation, and the framework of participation in which Sprs and Adrs orient to one another. Any one of these factors can provide the basis for deictic construal according to the demands of the ongoing relevance structure in which it is produced.

The monkey-human deictic field is far less complex than that of humans, but it includes positions of Speaker, Addressee, and object, body orientation, gestures, and sounds, and memory, anticipation, and emotions play a role.

Ide relates what I have called “construal” to what she calls “discernment” within the scope of the Japanese philosophy of wakimae. This is an excellent comparison consistent with the aims of the paper. Wakimae designates the Spr’s sense of place relative to an Adr and relative to the social setting. This dual orientation is also present in deixis, through what I called “embedding.”

When this set of determinations of the visual field is paired with that of the auditory field, the differences of dimension begin to occur. First, the auditory field as a shape does not appear so restricted to a forward orientation. As a field-shape I may hear all around me, or, as a field-shape, sound surrounds me in my embodied positionality. I am sitting at my desk, and I hear my wife approaching up the steps. She enters the study and speaks to me from the doorway to the left and behind me. I turn to greet her, but she has first been present and noted from behind in the sounding of her feet. I catalog my auditory experiences and note that the ticking of the clock comes from the right, the hiss of the radiator from the left, the hum of the light from above, and the wag of Josephine’s tail from under the table. All of these sounds occur simultaneously and “fill” the auditory field with their complex multiplicities.

But in like manner an audio-imaginary (quasi-auditory) scene is related to the implicit imaginary body. When this perspectival organization of perception is transmitted to imagery then there might as well be a quasi-distance of imagined sounds and hence a correlative quasi-loudness. The quasi-loudness of imaginary events varies with imagined changes of quasi-distance.

A closer examination of the bidimensionality of auditory field-shape shows that there is a certain variability that auditory focusing can reveal in relation to the copresence of surroundability and directionality. The contrast of the musical experience with everyday listening points to two such variations of focal attention. Quite ordinarily, sounds are taken directionally. The hammering from next door is heard as from next door. The sparrow’s song in the garden presents itself from the garden. But if I put myself in the “musical attitude” and listen to the sound as if it were music, I may suddenly find that its ordinary and strong sense of directionality, while not disappearing, recedes to such a degree that I can concentrate on its surrounding presence. Contrarily, when listening to the orchestra and in the highest moments of musical ecstasy, I can (perversely, perhaps) by an act of will also raise the question of directionality; and while I continue to be immersed in the sound, there also emerges a stronger sense of direction.

But what of sound? The mute object stands "beyond" the horizon of sound. Silence is the horizon of sound, yet the mute object is silently present. Silence seems revealed at first through a visual category.

In both cases one’s train of thought is likely to be upset by the "command" of the sound which is so penetrating or loud that he can’t “hear” himself think. The ability to reveal interiors, as the essential penetrability of sound presence, even applies to myself as an embodied being. Sound physically penetrates my body and I literally "hear" with my body from bones to ears.

With the introduction of a second modality of experience, in addition to what has been the predominantly perceptualist emphasis, listening becomes polyphonic. I hear not only the voices of the World, in some sense I “hear” myself or from myself. There is in polyphony a duet of voices in the doubled modalities of perceptual and imaginative modes. A new review of the field of possible auditory experience is called for in which attention would be focused on the copresence of the imaginative.

A second student, however, describes the same type of experience very differently. He "sees" himself jump from the airplane. He does not "feel" the wind or "see" the rushing of the ground to meet him but "sees" himself “out there” as a "quasi-other" jumping and falling toward the ground. On repeating these exercises in different classes, this difference consistently emerges. "Empirically" some self-imaginations are experienced as occurring "in" and “from” one’s own body, while others are objectified in that they place themselves "out there" apart from their sense of body as an "objectified quasi-other" in the imaginative experience.

In imagination the field-shape possibilities of the visual dimension are closer to those of an auditory field-shape than in the perceptual mode.

This deck available online at

along with the code: