Hotword Sifter - Make Google Home hear you in cocktail parties
Date:
Fri, 03/16/2018 - 10:30am - 12:00pm
Location:
CCRMA Seminar Room
Event Type:
Hearing Seminar Arden's approach combines auditory models (thus the Hearing Seminar connection) with multi-microphone speech enhancement and machine learning (of course) to solve the problem in a unique and powerful way. Come find out about the state of the art in all these areas at the CCRMA Hearing Seminar.
Who: Yiteng (Arden) Huang (Google)
What: Recognizing keywords in consumer devices
When: Friday March 16th at 10:30AM
Where: CCRMA Seminar Room
Why: Its important and its everywhere.
Title: Hotword Sifter - Make Google Home hear you in cocktail parties
Speaker: Yiteng (Arden) Huang, Google
Co-Authors: Thad Hughes, Turaj Shabestary, Taylor Applebaum
In this talk, I will discuss our recent work on multichannel keyword spotting (KWS) and introduce an interesting new idea of microphone-array speech enhancement supervised by and serving for machine learning. The idea was inspired by the human auditory system's ability to achieve the so-called cocktail party effect in adverse acoustic environments. We incorporated a feedback path from the neural network (NN) KWS classifier to its signal preprocessing frontend so that frontend noise reduction can benefit from, and in turn, better serve backend machine intelligence. This work complements the current mainstream KWS system architectures, which have only a sensory-driven bottom-up mechanism, with a cognitive-directed top-down processing. We found that the new system can significantly improve KWS performance for Google Home when there is strong music or TV noise in the background. While this innovative and successfully validated strategy of combining signal processing and machine learning is developed for KWS, its technical feasibility is presumably extensible to many other applications, including noise robust speaker identification and automatic speech recognition.
Speaker Biography:
Yiteng (Arden) Huang is a research scientist at Google in New York. His current focus is on signal (in particular speech and audio) processing and enhancement for machine intelligence. He received his Ph.D. from Georgia Tech in electrical and computer engineering (ECE) in 2001. Dr. Huang started his professional career at Bell Labs followed by founding and running WeVoice Inc. At WeVoice, he helped NASA develop acoustic and audio signal processing technologies and systems for future missions under SBIR (Small Business Innovation Research) contracts. Successful stories include a voice communication subsystem for next-generation spacesuits and a ZigBee-based wireless sensor (microphone) network for automatic acoustic monitoring on the International Space Station (ISS).
Dr. Huang has published extensively on signal processing. He has co-authored/co-edited 8 books and published over 70 peer-reviewed journal and conference papers, and currently holds 6 US patents. He received the 2008 Best Paper Award and the 2002 Young Author Best Paper Award from the IEEE Signal Processing Society (SPS), the 2009/2010/2014 NASA Tech Brief Awards, the 2007 Bell Labs Role Model Team Award, the 2000-2001 Outstanding Graduate Teaching Assistant Award from the ECE School of Georgia Tech, the 2000 Outstanding Research Award from the Center of Signal and Image Processing, Georgia Tech, and the 1997-1998 Colonel Oscar P. Cleaver Outstanding Graduate Student Award from the ECE School of Georgia Tech.
Dr. Huang is also recognized for his services to the IEEE SPS. He was an associate editor of the EURASIP Journal on Applied Signal Processing from 2005 to 2008 and of the IEEE Signal Processing Letters from 2002 to 2005. He served on the IEEE SPS Signal Processing Theory & Methods Technical Committee from 2006 to 2012 and on the IEEE SPS Audio and Acoustic Signal Processing Technical Committee from 2004 to 2012. He was the technical co-chair of the 2005 Joint Workshop on Hands-Free Speech Communication and Microphone Array and the 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
FREE
Open to the Public