Pd lab #4 & Homework assignment #5

Pd localization cues, Schroeder-style reverb and Phase vocoder analysis / resynthesis

The assignment is to create a short radioplay with a recorded text, stereo audio effects and sound effects. Shoot for a duration of around 120 secs (plus or minus 30). This is a demo to introduce several techniques which will be described below. Though it won't be strictly-speaking a binaural recording, it is intended for headphones. Techniques include localization with interaural intensity difference (IID), interaural time difference (ITD), Schroeder-style reverb, and phasevocoder processing for time warping and pitch transposition.


True binaural recording uses stereo mics inside your ears to capture as closely as possible the exact waves your ear canals receive. Played back over the right headphones, it preserves the IID and ITD cues that are basic to sound localization. We'll fake those in a Pd patch (~cc/220a/pd/pwrAndDelPan.pd). But at this stage we stop our synthesis model there, in terms of true accuracy. The real binaural technique captures filter (transfer function) differences caused by body parts shadowing and reflecting sounds from various directions: the ear flaps (pinnae), head, shoulders, etc.


Early work in binaural recording was accompanied by predictions that its superior imaging would attract a huge following and everyone would eventually listen through headphones. A few decades later, this has yet to happen, at least in the sense of offering a large selection of binaurally-mastered works. These works are still problematic for loudspeaker reproduction and that's one thing that zapped the phenomenon. And besides, why would anyone want to go about their lives listening with headphones and shutting out the world? Well, take off your earbuds and consider that we've clearly arrived at a new moment, but for easons not related to the fantastic imaging capabilities of binaural techniques (I think instead we're seeing a revolution which is more related to massive storage and personal music gratification). The ubiquitous earbud phenomenon is begging for binaural content. For a position paper on where this may be going see Jens Blauert's AES Heyser Lecture from 2003. He makes a provocative case for binaural as a part of an increasingly realistic synthetic world with many other modalities contributing.


One artist whose work leverages the medium of headsets is Janet Cardiff. Her work combines various media modalities and binaural is one of the key ones. She composes site-specific 3D audio narratives with spine-tingling interplay of real and phantom presences, binaurally produced. Her 2005 Words drawn in water for the Hirschorn Gallery in Washington, D.C. is another benchmark and opens up the possibilities of what you might expect to compose for pod-like devices. In her earlier work for SFMOMA in 2001, the composition led participants by the nose through the gallery, each holding a camcorder in playback mode with a pre-recorded self-guided tour. You'd turn a corner and someone in your earphones would be there singing in the space (acoustically convincing, so that you could point to them) only they weren't there then, but at some other point in time, past, alternative present, or future.


The 220a homework factory isn't directed to pod players and walking around, yet (but that's a good idea for the future, or perhaps a final project this year). Instead, your radioplay assignment will be suited to the tethered headphone setup at the CCRMA workstations. Start by picking a short text which might be a monologue, group dialog, or whatever you want, even a comic strip (but it should have a script; you can write one). You'll use your own voice and possibly other voices in combination depending on the text to be read. If indeed it's a dialog, invite others to read or if you're theatrically inclined, use your voice for the different characters. First, second or third person narratives are all fine and I'm hoping we'll get a variety from the class.

Hint #1: timing in the dialog track of a radioplay is different than straight reading, and you'll need to leave gaps which will provide space for sound effects where appropriate. Pace your reading accordingly.

Hint #2: this assignment needs lots of intermediate files, so invent a descriptive naming scheme and stay consistent.

Hint #3: You'll record your dialogue into the left channel of a stereo track and mix down in stereo; stick with 48kHz.

source material

1) start JACK, start Audacity, set Audio I/O Playback and Record to jack: alsa_pcm (that's the fancy name for the soundcard driver, which is the fancy name for the audio ins and outs on your computer).

Note: if you want, you can use Ardour for recording and audio manipulation instead of Audacity (described below); instructions on how to record into Ardour are here.

2) record a dialog track using your new mic (left channel in a stereo track) reading the text you've chosen. If more than one character, then read these other voices subsequently into separate tracks, using Audacity's overdub mode (select Preferences: Audio I/O: Play other tracks while recording new one). At this stage you generally want to consider normalizing the files if there's an undesired discrepancy in levels. Select a track and apply Effect : Normalize. Then, export the dialog tracks into separate files.

3) with just your own voice, overdub another track imitating sound effects that go with the text. Export. Save the project. Quit Audacity.

localization & reverb

4) for each track, create a panned stereo version. Overall, the radioplay should fill the sound field, positioning dialog and effects in different places and applying motion where appropriate

b) start pd -jack ~cc/220a/pd/pwrAndDelPan.pd &

c) start Audacity (must launch after pd), opening desired track, set Audio I/O Playback and Record to be jack-pure_data (so it accesses the Pd patch, both as a source sound going into Pd to be spatialized and as a source for recording)

d) record (overdubbing again) to create a panned version controlled by the L-R slider in the Pd patch, moving as needed in real time

e) export each panned track into a separate file

5) for each panned stereo version, create a reverberated version. This assignment requires at least three distinct reverb settings: e.g., if it were a dream sequence, the radioplay might have a narrator in a broadcast announcer booth, the voice of god speaking in a cathedral, and startling knocking on a bedroom door, all derived from reverb settings.

a) (important!) set Audio I/O Playback to jack: alsa_pcm

b) now experiment with Effect: Plugins 121-135: Freeverb (version 3) to get the right sound for three different reverbs, as mentioned above

c) export each reverberated track into a separate file

6) restart Audacity and bring in all reverb tracks, listen to the test mix. Save the project.

sound effects, time-warping, transposition

7) listen again to your original, unpanned imitation sound effects track in Audacity

8) gather a collection of instruments or whatever else into tracks containing the real sound effects, recording or downloading, as needed (you can make/modify this material using Tapestrea or ChucK as in previous labs/assignments) and then quit Audacity

9) start Pd, start Audacity, and now manipulate at least two sound effects with the Pd patch ~cc/220a/pd/phaselockedvoc.pd for time (speed) and transposition (transpo) modifications

10) move all sound effects to the same time position as in your imitation track using the Audacity Time Shift Tool

11) repeat steps to pan and reverberate each sound effect track as dictated by the text

12) mix the whole project and save as hw5.wav in the usual way