Interactive Spatial Audio Performance

Controlling the spatial location of sound sources (16 channel 3rd order Ambisonics) in real time by hand gesture and head movement

Soohyun Kim


Performance Video (decoded into 2ch binaural audio)


How it works?



This project is to design an interactive spatial audio performance in which performers can control the spatial location of sounds with their body (head orientation or hand gesture) in real-time.

VisonOSC, which detects hand gesture data, and FaceOSC, which detects head orientation data, were used. These detected data are sent to Wekinator, which is trained to convert them to the azimuth and elevation angle. And then the azimuth and elevation angle data are sent to the AmbiX plugin. AmbiX plugin is an open-source VST plugin that encodes a mono or stereo source into higher order Ambisonics format, and it controls the spatial location of the sound source with the azimuth and elevation angle. In this project, 3rd order Ambisonics which requires 16 channels was used.

Actually, before I went to the CCRMA stage on the night before the final presentation, although the ChucK codes and trained Wekinator projects for hand gesture detection and head orientation detection were prepared, I didn't even have a plan for what to perform. I just head for CCRMA from my dorm room with all my guitar gear at 11 pm. And, for 4 hours, I just kept randomly jamming with this system. While jamming for 4 hours, I reached the level where I can flow with the system; I got used to how much I should turn my head to move the sound source to where I wanted to place it, and I also got used to turning my head pretty much unconsciously while my hands are playing the guitar. And then, spontaneously, I just realized what I should perform at the final presentation. This process was very similar to the process in which I jam with my guitar or other (ordinary) instruments for hours when I have to compose a song or performance.


Although I haven't made a demo video, I also made a hand gesture dection system which can detect up to two hands and control two independent AmbiX plugin seperately, which means I can control two sound source's location at the same time with two hands.


Actually for the performance I didn't use the face orientation detection by FaceOSC becuase it was too noisy for a performance. I just used the x axis value of my nose detection by VisionOSC, and I trained Wekinator with the x axis position of my nose when I rotate on the swivel chair at the CCRMA stage; This resulted in much much more stable control with nice sensitivity.


AmbiX plugin:

I referd to '' by Prof. Ge Wang for the OSC input/output.