Rohit Prabhavalkar on Modern DNN Speech Recogntion

Date:

Fri, 05/24/2019 - 10:30am - 12:00pm

Location:

CCRMA Seminar Room

Event Type:

Hearing Seminar

Modern speech recognition works really well, especially compared to the HMM approaches of even 10 years ago. Most notably, current approaches abandon the feature calculations, timing models, and language models of yore, replacing them all with a single deep neural network. Amazing, and they work really well, converting millions of hours of speech to text, in all sorts of auditory conditions, in a tiny box that fits in our pockets, for everyday users.

I’m really happy that Rohit Prabhavalkar from Google will be at the Hearing Seminar to talk about his Listen-Attend-Spell work. End-to-end recognizers are the latest gold standard for all sorts of speech problems—they were all over ICASSP last week. Waveforms go in, a big neural network thinks about it, and characters or words come out. Most notably, the parameters of the entire network are trained with a single back-propagation calculation. No tweaks necessary. It’s really quite magical. And it works quite well.

Who: Rohit Prabhavalkar (Google)
What: End-to-End Modeling For Automatic Speech Recognition
When: Friday May 24th at 10:30AM
Where: CCRMA Seminar Room, Top Floor of the Knoll at Stanford
Why: How do DNNs solve speech recognition?

Bring your favorite phone to CCRMA and its likely there will soon be an end-to-end speech recognizer in it.

- Malcolm

End-to-End Modeling For Automatic Speech Recognition
Rohit Prabhavalkar

Abstract:Traditional automatic speech recognition (ASR) systems are comprised of a set of separate components, namely an acoustic model (AM); a pronunciation model (PM); and a language model (LM). The AM takes acoustic features as input and predicts a distribution over subword units, typically context-dependent phonemes. The PM, which is traditionally a hand-engineered lexicon maps the sequence of subword units produced by the acoustic model to words. Finally, the LM assigns probabilities to various word hypotheses. Typically, these modules are either trained independently, or are curated using expert knowledge.

Over the last several years, there has been a growing interest in developing "end-to-end" speech recognition systems which attempt to learn all of these components jointly in a single system. In this talk, I shall discuss our work involving various algorithmic and modeling improvements to build end-to-end speech recognition systems which surpass the performance of a conventional ASR system. I shall also discuss promising results obtained by applying this approach to the task of multi-lingual and multi-dialect speech recognition. Finally, I shall discuss some of the current challenges with these models and outline future research directions.

Bio: Rohit Prabhavalkar received his PhD in Computer Science and Engineering from The Ohio State University, USA, in 2013. Following his PhD, Rohit joined the Speech Technologies group at Google where he is currently a Staff Research Scientist. At Google, his research has focused primarily on developing compact acoustic models which can run efficiently on mobile devices, and on developing improved end-to-end automatic speech recognition systems. Rohit has co-authored over 30 refereed papers, which have received two best paper awards (ASRU 2017; ICASSP 2018). He currently serves as a member of the IEEE Speech and Language Processing Technical Committee.

FREE

Open to the Public

Calendar

Search this site:

Spring Quarter 2024

Music 101 Introduction to Creating Electronic Sounds
Music 128 Stanford Laptop Orchestra (SLOrk)
Music 155/255 (ARTSTUDI 239) Intermedia Workshop
Music 220C Research Seminar in Computer-Generated Music
Music 222A Quantum Computer Music
Music 228 SVOrk (Stanford Virtual Reality Orchestra)
Music 250A Physical Interaction Design for Music
Music 254 Computational Music Analysis
Music 257 Neuroplasticity and Musical Gaming
Music 319 Research Seminar on Computational Models of Sound Perception
Music 320C Audio DSP Projects in Faust and C++
Music 423 Graduate Research in Music Technology

Main menu

Secondary menu

Rohit Prabhavalkar on Modern DNN Speech Recogntion

Search this site:

Spring Quarter 2024