Tom Walters - The Intervalgram: An Audio Feature for Large-scale Melody Recognition
Who: Tom Walters (Google)
Why: Melodies are an interesting feature of audio
What: The Intervalgram: An Audio Feature for Large-scale Melody Recognition
When: Friday April 27th at 1:15PM
Where: CCRMA Seminar Room (Top Floor of the Knoll)
Think about your favorite melodies and bring them to CCRMA.
The Intervalgram: An Audio Feature for Large-scale Melody Recognition
Tom Walters (Google)
The ‘intervalgram’ is a summary of the local pattern of musical intervals in a segment of music. It is based on a chroma representation derived from the temporal proﬁle of the stabilized auditory image and is made locally pitch invariant by means of a ‘soft’ pitch transposition to a local reference. Sets of intervalgrams are used as the basis of a system for detection of identical melodies across a database of music. Using a dynamic-programming approach for comparisons between a reference and the song database, we evaluated performance on the ‘covers80’ dataset. A ﬁrst test of an intervalgram-based system on this dataset yields a precision at top-1 of 53.8%, with an ROC curve that shows very high precision up to moderate recall, suggesting that the intervalgram is adept at identifying the easier-to-match cover songs in the dataset with high robustness. The intervalgram is designed to support locality-sensitive hashing, such that an index lookup from each single intervalgram feature has a moderate probability of retrieving a match, with few false matches. With this indexing approach, a large reference database can be quickly pruned before more detailed matching, as in previous content-identiﬁcation systems.
Tom is a research scientist at Google in Mountain View where he works on applications of machine hearing to large-scale audio analysis problems. Recent applications have included sound effects search, video content analysis and cover song recognition. Prior to Google, Tom was at the University of Cambridge, where he completed a MSci in experimental and theoretical physics and PhD at the Centre for the Neural Basis of Hearing, under the supervision of Roy Patterson. His PhD research was into applications of the Auditory Image Model to various audio analysis tasks including scale-invariant and noise-robust speech recognition.