Introduction

This tool is designed to allow a user to match a song based on any segment in the song, in any key.

Concept: using a pitch tracker, record the frequencies into an array for each voice, then store the values by the frequency "steps." Then, we can compare an input sound segment against this recorded voice.
Features:
1. Song matching by short (4-notes or more) input segment from any part of the song.
2. Key insensitive: input segment can be of a different key than the original. The playback will be in the input key.
3. Error tolerance: input segment need not be completely identical, although any mistake may require longer input to find correct key.
The frequencies can be collected in two ways, either sampling regularly, or tracking the length of each note and storing them separately.
Problems encountered with sampling regularly: without knowing the shortest note length, it is possible to lose information if the sampling interval is longer than the shortest note length.
Problems encountered with tracking note lengths: storing note lengths is more difficult to implement. Note end/start detection is difficult on the pitch-tracker because of the sound envelope forms.

For this project, I used midi input and stored midi key numbers because midi input can be recognized discretely, and thus the length of the notes easier to determine.
Implementation

This project is separated into a few classes.

The voice class is an object storing the key differences and the note lengths. It is initialized by inputing a midi array of [midi_number, tempo_num] pairs, as modeled in the greensleeves example. The FindBestMatch method takes a segment of the same form as input, then traverse down the voice and compare to find the best matching sequence in this voice. Comparison is done by finding the angle between the two vectors through dot product. The best matching index is returned.

The musicDB is a simple data structure that stores a number of voices. It also implements the findBestMatch method, which traverse through its stored list of voices and find the best matching voice given a segment. The best matching song and the matched index is returned.

Using these two classes, one can load up various songs into the musicDB object. Then play some sequence with some midi input to find the best matching song. An example usage is here, which uses the S.M.E.L.T. (http://smelt.cs.princeton.edu/) KeyboardToPitch sample program, which I implemented as a class for easier usage.
Result

The resulting program actually matches a correctly played input segment fairly consistently. However, using vector dot-product to achieve error tolerance resulted in some mistakes being less tolerable than others. In addition, since key is not considered in the matching, and tempo is not implemented in the comparison, some songs can easily have identical 'segments.' The result is having to play more notes for accurate recognition.

The project can be much improved if we can implement pitch-tracking accurately, which will allow it to be not limited to midi input/output. Also, in order to compare taking into account of the note lengths, a new scheme other than dot-product should be developed. The program has not been tested on a large database. The algorithm of traversing through the entire song may not be scalable. Schemes like dividing the voice into measures or even bigger 'chunks' may improve the performance.

The complete project files can be downloaded here. The voice.ck, musicDB.ck, and keyInput.ck must be compiled before setup.ck. Keyboard is mapped to Midi notes as commented in the keyInput.ck file.

Key-Insensitive Music Recognition

Music 220A - Autumn 2009

Chun-Chang (Jason) Chen