Learning the Structure of Spoken Language
Date:
Fri, 10/09/2015 - 10:30am - 12:00pm
Location:
CCRMA Seminar Room
Event Type:
Hearing Seminar Automatically Learning the Structure of Spoken Language Without Supervision
Aren Jansen, Google Machine Hearing Group
Abstract:
The dominant paradigm in the speech recognition community for the past four decades has been to train automatic systems with as much transcribed data as we can get our hands on. This strategy has led to the development of highly accurate systems that have finally found a place in our daily lives. An unfortunate consequence of this trajectory, however, is that state-of-the-art recognition performance can only be achieved on languages and domains for which vast transcribed training resources either exist or can be easily obtained. Meanwhile, with public internet resources like YouTube and PodCasts, untranscribed speech audio is abundant and contains a wealth of hidden information regarding the acoustic-phonetic, lexical, grammatical, and semantic structure of the language being spoken. The trick is uncovering this structure automatically, an endeavor that will require new machine learning techniques, algorithms scalable to massive problem sizes, and a lot of patience. I will provide an overview of my efforts in these directions and describe some useful language- and domain-independent technologies that have been produced along the way.
Bio:
Aren Jansen joined Google in August as a Research Scientist in the Machine Hearing Group. Before that he was a Senior Research Scientist in the Human Language Technology Center of Excellence and an Assistant Research Professor in the Center for Language and Speech Processing, both at Johns Hopkins University. Aren received the B.A. degree in Physics from Cornell University in 2001. He received the M.S. degree in Physics as well as the M.S. and Ph.D. in Computer Science from the University of Chicago in 2003, 2005, and 2008, respectively. Aren’s research has explored a wide range of speech and audio processing topics that include unsupservised/semi-supervised representation learning, speech retrieval, content-based recommendation, latent structure discovery, time series modeling and analysis, and scalable algorithms for big data applications.
FREE
Open to the Public