Timbre Class Recognition

(Or, "It's All about the Eternal Quest for Tone.")


Background: The name of this class has been well suited, given its activities within an understanding of modeling that creative processes of musical interactions. I happened quite honestly to get the idea for this network when I was logged into a computer behind the ballroom near the copy machine. I heard the sound of Dan Levitin lecturing about an artist who attempted recreations of popular music. He suggested the difficulty of copying a certain guitar sound. I heard the guitar sound and as a guitar player myself,I thought about how I could get that particular sound. Then, I thought that I may be able to create a network that thought about how to get a particular sound, once given one that sounds quite like it. It would be a network that can recognize timbre, to to pick a similar sounding instrument, which would just be a stored timbre.

Approaches: As of now, I am aware that there have been other approaches and methods in this area of modeling perception, but I have not yet been able to research them. I will provide an analysis of these methods in the final draft of the project.

Definition of the task: The first task of the network is to be able to learn 3 timbre representations from each of three timbre classes. The second task of the network is to be able to receive a timbre representation and be able to tell into which of the three classes it falls.

Design, learning set, architecture and representation: The network is a sequential network, like Mike Jordan's. It will at first consist of the same number of input and hidden units dependent on the sample rate and size of sound file sampled. The number of hidden units will be a variable factor that I will test throughout this project. Are as many hidden units as input units necessary? See the Nichols-Serrano conference results for details. The output layer will consist of three units, one for each timbre class possible when the network is fed a timbre representation. I will print the attack envelope for around 50-100ms of sound to produce produce integers and transform them into binary number representations that are able to be learned by the network as timbres

What experiments do you have in mind for this network, Gabe? I want to produce and visualize results. What do I expect with a 3 layer sequential network? I expect that the network will learn the timbres so that it can recognize similar timbres within lower and lower numbers of epochs. I also want to see results based on decreasing the number of hidden units.

Future work can definitely be made in increasing and improving the design to include more timbre classes, more instruments per class, and more tuning of the expressions and more visual output. Expressions could contribute to better and more mathematically significant graphs. The results I aim to achieve with this project depend upon changing a few design characteristics to see where the errors are made and how they can be reduced.

Gabriel J. Serrano
Last modified: Fri Feb 26 16:07:29 PST 1999