Fri, 05/23/2014 - 11:00am - 12:30pm
CCRMA Seminar Room
The use of voice commands for human-computer interaction is becoming more prevalent thanks to the recent advancements of automatic speech recognition (ASR) technologies. In typical acoustic environments, audio captured by a microphone contains background noise, reverberation, and signals from interfering sources, making reliable speech capture a challenging problem. Some applications even require more than one user to interact with the system, e.g., gaming, which makes simultaneous speaker detection and localization crucial for enabling natural interactions. Distant multi-speaker speech capture often benefits from the use of microphone arrays that can provide enhanced speech signals using spatial filtering, or beamforming. Spatial filtering algorithms often rely on the accurate localization of active speakers, which may otherwise suppress the signals of interest. For more robust speaker localization, signals from microphone arrays can be combined with those from RGB cameras and depth sensors via multimodal fusion. This talk will describe algorithms for sound source localization with a microphone array including methods by statistical modeling for optimal localization as well as cochlear modeling, and their real-time implementations. It will also describe multimodal fusion for accurate detection and localization of simultaneous speakers. With successful integration, this technology will be one of the most crucial components for enabling more natural and intuitive human-computer interaction, bringing it one step closer to the ultimate goal of making machines interact like humans do.
Bowon Lee received the B.S. degree in Electrical Engineering from Seoul National University, Seoul, Korea in 2000, and the M.S. and Ph.D. degrees in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign in 2003 and 2006 respectively. From 2007 to 2014, he worked as a research scientist at Hewlett-Packard Laboratories in Palo Alto, California until he joined the faculty of the Department of Electronic Engineering at Inha University in March 2014. His research interests include statistical signal processing on audio and speech, microphone array signal processing, acoustic event detection and localization, and multimodal signal processing. He received two top 10% awards from the IEEE Workshop on Multimedia Signal Processing in 2009 and has served as the technical program committee of numerous IEEE conferences and workshops. He is a senior member of the IEEE and a member of the Audio Engineering Society.