Sound Recognition in Mixture


We present a method for recognizing sound sources in a mixture. Using source separation ideas based on probabilistic latent component analysis (PLCA), we learn dictionaries from each source and estimate the relative proportions of sound sources in a mixture by decomposing them with the dictionaries and summing the corresponding activations. In addition to the basic model, we introduce a new method for learning temporal dependency among dictionary elements using a transition matrix. We show this temporally constrained model shows better results than the basic model.

Demo Example

This video demo shows levels of three different sound sources (speech, gun and airplane) in an audio track of a movie. The bars in the left is based on the basic model and those in the right is on the improved model (temporally constrained with the transition matrix). The video shows that the improved model has relatively less false alarm, which is marked with the red circles.


  • Juhan Nam, Gautham J. Mysore and Paris Smaragdis, " Sound Recognition in Mixtures," In Proceedings of the International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), 2012