From CCRMA Wiki
Revision as of 00:32, 10 December 2009 by Shiweis (Talk | contribs)

Jump to: navigation, search

Pictone - Vision Based Music System

Music 256A Fall 2009 Final Project. Shiwei Song (shiweis)



Modern computers have enabled novel ways for people to interact with and generate sound. My project is to build an instrument where the inputs are pictures drawn with pen and paper. The system takes video feed of simple shapes and lines drawn on paper and converts them into notes of different instruments. The audience sees the performer's drawings, visualization of notes, and hears the corresponding sonic results.

My initial inspiration came from the Birds on the Wires video. The creator converted a picture of birds on wires to musical notes and produced a short piece of music that is both simple and beautiful. I felt that creating sound through drawings has the following advantages:

  • Most people can write or draw simple shapes. This system invites anyone to try and make some music/noise (and have fun!).
  • The pictorial music language is novel to the audience, so they will be constantly surprised or left wondering what the next output will be.
  • The characters/pictures in the music language can itself have some sort of meaning (perhaps when combined together) so there will be both visual and sonic messages conveyed.
  • "Live coding" by drawing is fairly unconventional and this itself may be an interesting experiment to perform or observe.
  • Drawing has a lot of freedom that cannot be mapped to traditional hardware devices. The performer can further influence the system by change the way images are taken by the system such as by rotating the paper. Combined, the performer has a lot of room for interaction with the instrument.

I hope that my project will serve as the basis for a system that will eventually demonstrate the advantages listed above.


Although there are a lot of room to develop/explore, the goal of this project is to create a basic working system that consists of:

  • A simple music language that maps to sound and other controls (such as loops).
  • A program that can recognize music written in the simple language and synthesize sound/give some visual feedback.

The user will interact with the system by drawing "sentences" using the simple language (or use pre-drawn ones), hold it up to the camera to get digitalized, repeat. The system will then output the sound of the music it has digitalized so far, incorporating loops and perhaps several tracks. The audience will see what the performer has drawn through live feed from the camera and visualization of what is being played.



Milestones & Extensions



The project will consists of two parts: the music language and the software system.

Picture Music Language

This project will construct a simple form of the picture music language. The primary concern is to have a good mapping of the pictures to representations and also have it easily recognizable by the software (so perhaps sacrificing picture meaning/aesthetics for ease of recognition.

The language will control two aspect of the music:

  • The controls/flow of the sound produced. Such as indicating loops, tracks, speed, etc.
  • Actual notes/instruments and pauses (lack of notes).

Some design questions I'm still debating:

  • Whether to have notes from same instrument or selected notes from several instruments.
  • Whether to allow polyphony, and if so throw what means? Either having the user input different tracks (probably easier) or have use some spacial mapping in the language (such as position on paper).
  • What should color and shape map to? Should vertical position on paper be used?

So far I've tried some basic shapes from below. I converted each picture to binary and did a distance transform. Then sum of pixel differences are taken to measure similarity between pictures. Using this method I was able to detect triangle, square, person icon, and smilie face under fairly ideal conditions. I can also detect the primary colors fairly accurately. The square and circle were too similar to be differentiated.


Software System

The software system will consist of several parts:

  • The vision system that uses OpenCV and digitalizes the characters.
  • The internal controller that takes the digitalized representation and produces the synthesized sound (maybe through STK) and also display the visualization.
  • The visual system that shows the camera feed and also some sort of visualization (perhaps simply the notes to be played and the organization of different tracks).

There are currently no plans to have any networked component to this.


Mostly friends and classmates will test this. A measure of goodness is how many times they say "cool" during testing.


I will be working on this by myself.


  • 11/16/2009: Basic language ideas. Ability to recognize some characters in the vision component.
  • 11/30/2009: Can recognize most of the characters and convert to internal representation. Can produce sound.
  • 12/10/2009: Some visuals. Make the other components more robust. Ready for presentation.