Latent Variable Modeling of Audio
Abstract: In this seminar, we will describe how latent variable models can be used to analyze and process speech and audio signals. We will begin with the basics of latent variable multinomial decompositions and work our way upwards through various higher-level models that can perform matrix and tensor factorizations, extract shift-invariant features, learn time-series, perform sparse coding, and more. We will examine their interpretations and extensions as well as their relationship to other popular machine learning techniques. We will show how this field combines elements from machine learning and signal processing to produce hybrid algorithms to produce next-generation approaches to some of the most challenging problems in speech and audio processing. We will cover models that can be effectively used for a large number of applications, ranging from signal separation, signal de-noising, speech recognition, pitch tracking, de-reverberation, audio/visual object extraction, user-assisted audio selection, echo cancellation, polyphonic music transcription, missing data imputation and more.
Gautham J. Mysore is a research scientist at Adobe's Advanced Technology Labs. He received his M.A. and Ph.D. and from CCRMA in 2005 and 2010 and his M.S. in Electrical Engineering from Stanford in 2008. He has previously been a visiting researcher at the the Gatsby Computational Neuroscience Unit at the University College London. In 2010, he won the best student paper award at the LVA / ICA conference for his work on non-negative hidden Markov models. His research interests include machine learning and signal processing for various audio applications.
Paris Smaragdis is an assistant professor of computer science and electrical and computer engineering at the university of Illinois at Urbana-Champaign. Prior to that he was a senior research scientist at Adobe's Advanced Technology Labs, and a research scientist with Mitsubishi Electric Research Labs (MERL). Prof. Smaragdis obtained his Ph.D. at MIT in 2001, and was a postdoc there in 2002. In 2006 Prof. Smaragdis' research accomplishments were recognized by the MIT Tech Review, which selected him as one of the top young innovators of the year. Prof. Smaragdis' research interests are applications to machine learning for audio signal processing problems, machine listening and computation and the arts. Prof. Smaragdis is a senior member of the IEEE and a member of the MLSP-TC.