Improving representations of music audio for music information retrieval

Fri, 04/13/2012 - 1:15pm - 2:30pm
CCRMA Seminar Room
Event Type: 
Hearing Seminar
There are lots of tools we'd like to build to analyze and respond to music. Denizens of the Hearing Seminar probably prefer an approach based on features. It's worked well for humans, why not make it work for machines? But machine-learning systems are agnostic---they use whatever data you give them. Much of the original work on music information retrieval was based on MFCC, a representation of speech that was designed for speech (based on perception) and throws away the pitch! How can one look at music and ignore, for example, the pitch?!?!?!?!

I'm happy to introduce Phillipe Hammel to the Hearing Seminar. He'll be at CCRMA this week to talk about his work on learning better features for music audio. He's interning at Google, and undoubtably has lots of good stories (and data.)

    Who:    Phillipe Hamel (Google and Univ. of Montreal)
    Why:    Music is fun (and learning from it is even better.)
    What:    Improving representations of music audio for music information retrieval.
    When:    Friday April 13th at 1:15PM
    Where:    CCRMA Seminar Room - Top Floor of the Knoll

This coming Friday! Bring your music ears (and their feature detectors) and Phillipe will perhaps improve on them!

- Malcolm

Improving representations of music audio for music information retrieval.

The music information retrieval field (MIR) depends on machine
learning to solve a multitude of tasks. Many of these task, such as
auto-tagging, genre classification, instrument recognition, music
similarity and automatic transcription relies heavily on features
extracted from music audio. However, the importance of the choice of
these audio features is often overlooked. Often, audio features throw
away too much relevant musical information from the signal, or are
blind to temporal dynamics. Obviously, bad features will hinder the
performance of any machine learning system, however complex, learning
over these features.

I will discuss simple methods to obtain general purpose audio features
that are well suited to preserve music information and have desirable
properties such as temporal and spectral shift invariance. I will also
discuss how machine learning, either unsupervised or supervised, can
be incorporated in the feature extraction to obtain more dataset
specific or task specific features. Finally I will present a few
recent methods that have been proposed to capture longer-term dynamics
by using combinations of features at different time scales.

Philippe Hamel is a Ph.D. student in computer science at Université de
Montréal, under the supervision of Douglas Eck and Yoshua Bengio. His
main research interest is machine learning and its applications to the
music domain. His recent work has been focused on artificial neural
networks, deep learning and signal processing. He is interested in
music information retrieval problems such as automatic tagging, music
classification and music recommendation. Prior to his work in computer
science, he studied physics at Université de Montréal where he
obtained his M.Sc. in theoretical physics (2006). Philippe is
currently doing an internship at Google Inc., in Mountain View, CA,
working on music recommendation.

Open to the Public
Syndicate content