Thesis Defense: Interactive Sound Source Separation

Date: 
Fri, 11/22/2013 - 1:00pm - 3:30pm
Location: 
CCRMA Stage
Event Type: 
Other
In applications such as audio denoising, music transcription, music remixing, and audio-based forensics, it is desirable to decompose a single-channel recording into its respective sources. One of the most promising and effective class of methods to do so is based on non-negative matrix factorization (NMF) and related probabilistic latent variable models (PLVM). Such techniques, however, typically perform poorly when no isolated training data is given and offer no mechanism to improve upon unsatisfactory results.
 
To overcome these issues, we present a new interaction paradigm and separation algorithm for single-channel source separation.  The method works by allowing an end-user to roughly paint on time-frequency displays of sound. The rough annotations are then used to inform a NMF/PLVM algorithm and perform an initial separation.  The output estimates are presented back to the user and the entire process is repeated until a desired result is achieved. In doing so, we 1) eliminate the need for isolated training data required by NMF/PLVM techniques, 2) minimize the  annotation effort required for end-users, and 3) greatly improve separation quality, thus making past methods more general and more powerful.  
 
For evaluation, we developed and released an open-source software project embodying our approach, conducted user studies, and submitted separation results to the fourth, community-based, signal separation evaluation campaign. For a variety of real-world tasks, we found that expert users of our software can achieve state-of-the-art separation quality and inexperienced users can achieve good separation quality with minimal instruction.  To download the application, code, and audio/video demonstrations, please see http://isse.sourceforge.net

Bio: Nicholas J. Bryan is a PhD candidate working with Prof. Ge Wang at CCRMA and Gautham J. Mysore at Adobe Research. His research interests are at the intersection between signal processing, machine learning, and human-computer interaction.
FREE
Open to the Public
Syndicate content