Interactive Sound Source Separation

For my dissertation, I'm working on the problem of taking a single audio recording (e.g. a pop song) and separating it into its respective sound sources (e.g. drums, bass, vocals, etc.). In particular, I'm interested in the idea of a source separation audio editor, as oppose to more traditional, sequential audio editors. In this case, we can leverage interactive user-feedback to improve or inform source separation algorithms and improve separation quality.

To incorporate user-feedback, we allow end-users to roughly draw or paint on spectrogram displays of sound. Color is used to denote sound source and opacity is used as a measure of confidence or strength. The time-frequency painting annotations are then used to update the separation estimates and iteratively refine the results. Because of the emphasis on user-feedback, we use the term "interactive" source separation for this work. A short demo video of this work is shown below.





Interactive Single-Channel Sound Source Separation Demo



Papers

"An Efficient Posterior Regularized Latent Variable Model for Interactive Sound Source Separation."
N. J. Bryan, G. J. Mysore
International Conference on Machine Learning, Atlanta, GA. June 2013.
(web, paper)
"Interactive Refinement of Supervised and Semi-Supervised Sound Source Separation Estimates."
N. J. Bryan, G. J. Mysore
IEEE International Conference on Acoustics, Speech, and Signal Processing, Vancouver, Canada. May 2013.
(web, paper)
"Interactive User-Feedback for Sound Source Separation."
N. J. Bryan, G. J. Mysore
International Conference on Intelligent User-Interfaces, Workshop on Interative Machine Learning (Extended Abstract/Workshop Paper), Santa Monica, CA. March 2013.
(abstract)



homepage