CS229 Machine Learning Project Proposal
To implement and use Hidden-Markov Models (HMMs) for the for the purpose of understanding, transcribing, and modeling the musical parameters of an acoustic drum set.
The focus of the project will consist of surveying current techniques of HMMs used in automatic speech recognition and apply similar methods to model each sub-instrument of the drum set including the bass drum, floor tom, snare, mid-toms, hi-hat, crash cymbal, and ride cymbal. The training of the instrument models will be performed using time and frequency parameters of pre-recorded isolated drum sounds, which can easily be obtained via drum-sample libraries used in music production for negligible cost. The first-stage results should be able to accurately and independently identify each sub-instrument.
Once successful models of each sub-instrument are obtained, a higher-level set of HMMs can be used to model a rhythmic pattern, where a single state of the HMM model would represent a single musical beat or duration of time. The output of the first level can help train the second and eventually vice versa. Multiple levels of HMMs can be used to break down the larger problem of instrument/rhythm transcription, similar to speech recognition where one level is used to model the multiple phonemes of the vocal tract and the next models individual words of a given language. A single musical beat can be represented as a linear combination of the sub-instruments and trained on entire measures of prerecorded material as opposed to single instrument samples. Because of the almost unlimited variability of rhythmic patterns, training should focus on a single genre of pattern as proof of concept and then move towards a more generalized set of pattern models. The second-stage results should be able to simultaneously identify combinations of the drum sub-instruments in reference to time. See figure below.
The motivation behind understanding and transcribing acoustic drum set sounds and patterns can be found throughout music information retrieval, music education, music production, and live human-machine music interaction performance. With respect to music information retrieval, this work could eventually be used for an entire model of a pop song, genre classification, and more. For music education purposes, students could play alongside a software program using the model and obtain feedback on their performance. In music production, large databases of audio samples and recordings could be more easily sorted and automatically identified knowing the rhythmic patterns. With respect to live human-machine music interaction, the rhythm model could be used in suggesting an accompanying drum pattern for a given piece of work, help compose new rhythmic variations automatically, or improvise alongside a human performer.
Students: Nicholas J. Bryan
Professors: Ge Wang, Andrew Ng
Milestone: 12:00pm Friday, 11/16 Poster Presentation: Morning of Wednesday, 12/12 Final Writeup: 12:00am Friday 12/14
Papers to Read/Concepts to Understand
HMMs and how they are used for speech recognition
How the knowledge of HMMs/speech recognition can be applied to rhythm understanding
ISMIR Papers of Relevance
additional, potential references
Hidden Markov Model Toolkit (HTK) HTK is primarily used for speech recognition research