Rohit Prabhavalkar on Modern DNN Speech Recogntion
Date:
Fri, 05/24/2019 - 10:30am - 12:00pm
Location:
CCRMA Seminar Room
Event Type:
Hearing Seminar I’m really happy that Rohit Prabhavalkar from Google will be at the Hearing Seminar to talk about his Listen-Attend-Spell work. End-to-end recognizers are the latest gold standard for all sorts of speech problems—they were all over ICASSP last week. Waveforms go in, a big neural network thinks about it, and characters or words come out. Most notably, the parameters of the entire network are trained with a single back-propagation calculation. No tweaks necessary. It’s really quite magical. And it works quite well.
Who: Rohit Prabhavalkar (Google)
What: End-to-End Modeling For Automatic Speech Recognition
When: Friday May 24th at 10:30AM
Where: CCRMA Seminar Room, Top Floor of the Knoll at Stanford
Why: How do DNNs solve speech recognition?
Bring your favorite phone to CCRMA and its likely there will soon be an end-to-end speech recognizer in it.
- Malcolm
End-to-End Modeling For Automatic Speech Recognition
Rohit Prabhavalkar
Abstract:Traditional automatic speech recognition (ASR) systems are comprised of a set of separate components, namely an acoustic model (AM); a pronunciation model (PM); and a language model (LM). The AM takes acoustic features as input and predicts a distribution over subword units, typically context-dependent phonemes. The PM, which is traditionally a hand-engineered lexicon maps the sequence of subword units produced by the acoustic model to words. Finally, the LM assigns probabilities to various word hypotheses. Typically, these modules are either trained independently, or are curated using expert knowledge.
Over the last several years, there has been a growing interest in developing "end-to-end" speech recognition systems which attempt to learn all of these components jointly in a single system. In this talk, I shall discuss our work involving various algorithmic and modeling improvements to build end-to-end speech recognition systems which surpass the performance of a conventional ASR system. I shall also discuss promising results obtained by applying this approach to the task of multi-lingual and multi-dialect speech recognition. Finally, I shall discuss some of the current challenges with these models and outline future research directions.
Bio: Rohit Prabhavalkar received his PhD in Computer Science and Engineering from The Ohio State University, USA, in 2013. Following his PhD, Rohit joined the Speech Technologies group at Google where he is currently a Staff Research Scientist. At Google, his research has focused primarily on developing compact acoustic models which can run efficiently on mobile devices, and on developing improved end-to-end automatic speech recognition systems. Rohit has co-authored over 30 refereed papers, which have received two best paper awards (ASRU 2017; ICASSP 2018). He currently serves as a member of the IEEE Speech and Language Processing Technical Committee.
FREE
Open to the Public