Spatialization of Monophonic Recordings

Brent Townshend

Music220A, October 28, 2013

Background

Although the homework assignment included recording new material to use for spatialization, I decided to focus instead on creating an easy process for making binaural recordings from monophonic source material. That allowed to spend more time experimenting with how best to spatialize existing recordings for which a multitrack version is not available, while, I believe, satisfying the goals of the homework in terms of learning about spatialization and production of binaural recordings. For this, I used an excerpt from Monty Python's In Search of the Holy Grail.

Operation

The first step was designing a way to get the desired spatial information into a project. For this, I chose to use Audacity's labeling capability with a a regular set of labels. Each label consists of the name of an Actor (actually a potential sound source), optionally followed by a set of 3 coordinates specifying the x,y,z position of the source relative to the listener. The last set position for each Actor is stored and reused if a subsequent label does not specify a position, thus a fixed-position source only need be specified in the first label for which that Actor appears. Furthermore, if a subsequent label specifies a position for an Actor different from the last given position, then the position is interpolated over the intervening time, giving a smooth movement of the Actor from the old position to the new one.

The labels from Audacity are exported as a text file and the ChucK programs below read both the monophonic WAV file and the labels file to create a 4-channel version of the input with the apparent position moving according to the specified labels. The x,y positions map onto the azimuth of the source, the z position maps onto an elevation, and the triplet give the distance of the source. For mapping to a 4-channel horizontal plane of speakers, the elevation is ignored and the two speakers bracketting the calculated azimuth are each driven with a gain proportional to the cosine of the angle between the source and that speaker. The gain is further modulated by the inverse-square of the distance, where one unit distance results in a gain of 1.0. For this setup, the elevation information is ignored. The 4-channel signal is then mapped to a binaural pair of signals using Binaural4.ck

The modules in this program are:

LabelsReader.ck: A class to read Audacity label files in the format described above>
Trajectory.ck: A class to smoothly interpolate an x,y,z trajectory
HW2.ck: The main program, which maps the input to a pair of output files which form the binaural pair.

Results (4-channel -> Binaural)

The source material is Dark Knight.wav and the resulting binaural output signal is Dark Knight-out.wav

More HRIR's...

There are a few limitations of the above method; spatial imaging is somewhat coarse and there is no control over elevation of the sound sources. In addition, rapid changes in the gains of the channels can result in discontinuities (clicks). To overcome some of these, I looked into a more complete binaural mapping using a set of HRIR's collected by IRCAM in an anechoic chamber. These cover the entire sphere of direction with 15 degree resolution. Using the same labelling for the dynamic position, I put together a Matlab program to interpolate the HRIR's and convolve them with the source. Basically it finds the 4 sampled HRIRs closest to the direction of the virtual sound source and interpolates them in the azimuth and elevation directions and convolves the signal against all 4 of them and then forms a weighted sum depending on the exact position of the virtual source between the sample points. These programs are:

DK.m: Main MATLAB script to process the signal
bt_synthesis.m: Function to perform the mono->binaural mapping using a set of HRIR's