Tom Walters - The Intervalgram: An Audio Feature for Large-scale Melody Recognition

Fri, 04/27/2012 - 1:15pm - 2:30pm
CCRMA Seminar Room
Event Type: 
Hearing Seminar
Melody recognition is hard, not the least of the reasons because the song can be transposed and not change the basic melody. It is easy to consider all tranpositions, but this extra complexity is really an issue for large-scale melody recognition. Tom Walters, a research scientist at Google, will be talking about tests he has done to scale melody recognition to very large databases. I'm sure you can image why this might be important to Google. :-)

    Who:    Tom Walters (Google)
    Why:    Melodies are an interesting feature of audio
    What:    The Intervalgram: An Audio Feature for Large-scale Melody Recognition
    When:    Friday April 27th at 1:15PM
    Where:    CCRMA Seminar Room (Top Floor of the Knoll)

Think about your favorite melodies and bring them to CCRMA.

- Malcolm

The Intervalgram: An Audio Feature for Large-scale Melody Recognition
Tom Walters (Google)

The ‘intervalgram’ is a summary of the local pattern of musical intervals in a segment of music. It is based on a chroma representation derived from the temporal profile of the stabilized auditory image and is made locally pitch invariant by means of a ‘soft’ pitch transposition to a local reference. Sets of intervalgrams are used as the basis of a system for detection of identical melodies across a database of music. Using a dynamic-programming approach for comparisons between a reference and the song database, we evaluated performance on the ‘covers80’ dataset. A first test of an intervalgram-based system on this dataset yields a precision at top-1 of 53.8%, with an ROC curve that shows very high precision up to moderate recall, suggesting that the intervalgram is adept at identifying the easier-to-match cover songs in the dataset with high robustness. The intervalgram is designed to support locality-sensitive hashing, such that an index lookup from each single intervalgram feature has a moderate probability of retrieving a match, with few false matches. With this indexing approach, a large reference database can be quickly pruned before more detailed matching, as in previous content-identification systems.

Tom is a research scientist at Google in Mountain View where he works on applications of machine hearing to large-scale audio analysis problems. Recent applications have included sound effects search, video content analysis and cover song recognition. Prior to Google, Tom was at the University of Cambridge, where he completed a MSci in experimental and theoretical physics and PhD at the Centre for the Neural Basis of Hearing, under the supervision of Roy Patterson. His PhD research was into applications of the Auditory Image Model to various audio analysis tasks including scale-invariant and noise-robust speech recognition.
Open to the Public
Syndicate content