Music 220c - Project Blog

Reza Payami



Here is a video of the piece, "Dialogue", performed in the evening of 28-May-2013 at Bing Studio.



Here are the source code files for the related ChucK and Processing programs:




The processing program, which is controllable by OSC messages through an iPhone application like TouchOSC, was implemetened and the related GUI looks like the following. Each input line, including violin and voice, is granulilzed into four sound sources. So, there will be eight sound sources which can virtually move in space. Spatialization knobs can move either automatically by entering a target value in "Pan Knob Target", or by moving sliders on the iPhone-Processing application.

Three possible values can be entered in the "Pan Knob" field: '-1' means moving the granulized sound sources for violin (left input) '-2' moves the related sources for voice (right input), and '-3' moves all the eight sources together to reach the "Pan Knob Target". Using ChucK spatialization Chugins like 'Pan8', 'Pan16' or 'AmbPan', the sound sources can be mapped to different number of speakers and using different methods.

The piece starts slowly with violin harmonics. Gradually, the electronic part will manipulate the violin sound, accompanied by long duration vocal notes. The singer then stops singing and starts speaking. Decreasing the grains size, the disklavier enters and the crescendo continues until the piece reaches a climax. Then, the violin, voice and electronics fade out and the disklavier performs a solo section. This part will be followed by the improvisation of different performers interactng with the electronics. The coda includes the recap of the violin harmonics accompanied by the other instruments. Finally the piece ends through a fade-out.



The final piece will include different components and performers as below. Instead of using Zirkonium, spatialization will be implemented in ChucK and Processing controllerd by GUI and iPhone through OSC messages.



"Threnody for the victims of Hiroshima" and its different sections can act as a model for the piece.

Animating score:

Instead of using strings orchestra, each instrument can be manipulated by a different computer / performer to have some effects like pitch-shift, controllable in real-time.



The following is the initial diagram of the possible instruments and devices; some of the parts may be eliminated.




Zirkonium was installed and tested and it seems to be an appropriate tool for spatialization in real-time. It can handle different customized speaker configurations including dome-based arrangements. Zirkonium can play different audui tracks and input channels simultanously and sounds to be a good choice for combining fixed media with the real-time performance, while its latency needs to be measured and tested. It appears to accept OSC messages to control the location of each sound source.



Some ideas related to the composition and the final piece were discussed. Apart from the involved instruments and the underlying technology, the piece will have different sections by using different articulations of each instrument. Celetto, violin, guitar and voice are some options to be used in the piece. The two implemented components, "face detection + wiimote + voice synthesis", and "granular synthesizer" may also be included as some instruments in the composition. In addition, "cellular automata + q3osc" program can be used to control the disklavier in real-time. Zirkonium can act as a tool for achieving real-time spatialization utilizing the speakers at Bing studio.



Glitch Free FM Vocal Synthesis by Chris Chafe, and LPC Toolkit by Marc Catwright have been two other resources studied for vocal synthesis.

Extending face detection in FaceChant.pde, the coordinates of the detected face rectangle origin is sent to a modified version of the source-filter formant synthesizer model in by OSC messages. Some parameters like vibrato is controllerd by wii-mote controller, and the vocal pitches can be entered using keyboard input.

Using granular synthesis, a 6-channel piece draft based on a short sentence was written in The output amplitude is controlled by wii-mote y axis.This draft piece has the following sections, by using granular synthesis on the short sentence as the only used material.


For real time ineraction, I tried to find a proper solution for lip motion detection to drive the speech analysis component. OpenCV is a C++ open source solution for computer vision which can be helpful in this way. Using its Processing library, the following file was implemented to detect face boundary and its movements.


Instead of capturing the video directly by OpenCV, as there is a problem with the recent QuickTime version on Mac together with Processing OpenCV library, the image was captured using 'video' library and then copied to OpenCV.

Active appearance model (AAM) sounds like a more accurate method for lip and mouth detection. It uses an algorithm for matching a statistical model of object shape and appearance to a new image.



A research/project topic I'm thinking about is related to real-time voice analysis/synthesis. The followings are some links and materials which have been useful in providing some ideas and solutions.

"" by Perry Cook is based on Formant_Synthesis_Model, which is a Source-Filter model in which the source models the glottal impulse train and the filter models the formant resonances of the vocal tract. Thinking of a source-filter model, either source or filter can get changed to achieve some more interesting results rather than normal voice.

"VoicForm" is a four-formant synthesis STK instrument used in ChucK voic-o-form example, implemented in FormSwep.h, FormSwep.cpp, VoicForm.h , VoicForm.cpp.

Juluis Smith's Physical Audio Signal Processing, and especially the sections about Kelly_Lochbaum_Scattering_Junctions, Vocal_Tract, Linear_Predictive_Coding_Speech, as well as Perry Cook's Thesis have been some useful resources.

I may also extend and integrate some real-time interaction components from my 220b project which uses q3Osc/motion detection. As an example of real-time control, the above was modified by Rebecca Fiebrink and Ge Wang in which provides joystick and keybaord control to manipulate voice synthesis parameters.