Dynamics Modeling in Sound Mixtures

Wed, 02/24/2010 - 5:15pm
CCRMA Classroom [Knoll 217]
Event Type: 
DSP Seminar
Dynamics Modeling in Sound Mixtures

Gautham J. Mysore

In order to accurately model sounds, we need to exploit as much structure as possible. Dictionary learning methods such as non-negative matrix factorization (NMF) and probabilistic latent component analysis (PLCA) do a good job of modeling the spectral structure of sounds. However, they fail to provide a statistical description of the temporal structure (dynamics) of sounds. The importance of dynamics is demonstrated by the use of hidden Markov models (HMMs) in speech recognition. However, HMMs have a rigid observation model that is not amenable to capture variations in spectral structure of different occurrences of a single state. We propose a novel algorithm for jointly learning the spectral structure as well as a statistical description of the dynamics of sounds. In this algorithm, rather than learning a single dictionary to characterize spectral structure, we learn several small dictionaries to describe different aspects of the sound. We also jointly learn a Markov chain to describe the dynamics between the dictionaries. We then propose a method of combining models of individual sounds with an additive interaction model. This gives us a model of multiple sound sources that incorporates spectral and temporal structure of both sources. This is a general model for mixtures of sounds and can be used for various inference tasks. We discuss the application of this algorithm to source separation.
For CCRMA Users Only
Syndicate content