Learning the Structure of Spoken Language

Date:

Fri, 10/09/2015 - 10:30am - 12:00pm

Location:

CCRMA Seminar Room

Event Type:

Hearing Seminar

How is it that we make sense of the auditory world around us? We are born with no innate knowledge of what it means to be a sound, or how to form words, let alone what a language is. I'm happy to introduce Aren Jansen to the community. While he was at Johns Hopkins he has developed new techniques to learn language from unlabeled audio, just like a baby. He'll be at CCRMA to talk about this work and we can talk about how it might explain audio perception.

Automatically Learning the Structure of Spoken Language Without Supervision
Aren Jansen, Google Machine Hearing Group

Abstract:
The dominant paradigm in the speech recognition community for the past four decades has been to train automatic systems with as much transcribed data as we can get our hands on. This strategy has led to the development of highly accurate systems that have finally found a place in our daily lives. An unfortunate consequence of this trajectory, however, is that state-of-the-art recognition performance can only be achieved on languages and domains for which vast transcribed training resources either exist or can be easily obtained. Meanwhile, with public internet resources like YouTube and PodCasts, untranscribed speech audio is abundant and contains a wealth of hidden information regarding the acoustic-phonetic, lexical, grammatical, and semantic structure of the language being spoken. The trick is uncovering this structure automatically, an endeavor that will require new machine learning techniques, algorithms scalable to massive problem sizes, and a lot of patience. I will provide an overview of my efforts in these directions and describe some useful language- and domain-independent technologies that have been produced along the way.

Bio:
Aren Jansen joined Google in August as a Research Scientist in the Machine Hearing Group. Before that he was a Senior Research Scientist in the Human Language Technology Center of Excellence and an Assistant Research Professor in the Center for Language and Speech Processing, both at Johns Hopkins University. Aren received the B.A. degree in Physics from Cornell University in 2001. He received the M.S. degree in Physics as well as the M.S. and Ph.D. in Computer Science from the University of Chicago in 2003, 2005, and 2008, respectively. Aren’s research has explored a wide range of speech and audio processing topics that include unsupservised/semi-supervised representation learning, speech retrieval, content-based recommendation, latent structure discovery, time series modeling and analysis, and scalable algorithms for big data applications.

FREE

Open to the Public

Search this site:

Spring Quarter 2024

Music 101 Introduction to Creating Electronic Sounds
Music 128 Stanford Laptop Orchestra (SLOrk)
Music 155/255 (ARTSTUDI 239) Intermedia Workshop
Music 220C Research Seminar in Computer-Generated Music
Music 222A Quantum Computer Music
Music 228 SVOrk (Stanford Virtual Reality Orchestra)
Music 250A Physical Interaction Design for Music
Music 254 Computational Music Analysis
Music 257 Neuroplasticity and Musical Gaming
Music 319 Research Seminar on Computational Models of Sound Perception
Music 320C Audio DSP Projects in Faust and C++
Music 423 Graduate Research in Music Technology

Main menu

Secondary menu

Learning the Structure of Spoken Language

Search this site:

Spring Quarter 2024