Aaron Master (Dolby) - DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement
Date:
Fri, 04/28/2023 - 10:30am - 12:00pm
Location:
CCRMA Seminar Room
Event Type:
Hearing Seminar 
Who: Aaron Master (Dolby)
What: DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement
When: Friday April 28th, 2023 at 10:30AM
Where: CCRMA Seminar Room (Top Floor of the Knoll at Stanford)
Why: How can we improve our listening environment?
We'll head to Tresidder after the talk for lunch, join us if you wish.
Title: DeepSpace: Spatial and Source Cue Based Source Separation for Dialog Enhancement and More
Presenter: Aaron Master
Authors: Aaron Master, Lie Lu, Jonas Samuelsson, Scott Norcross, Heidi-Maria Lehtonen, Harald Mundt, Dan Darcy, Nathan Swedlow, Audrey Howard
Abstract:
Dialog Enhancement (DE) is a feature which allows a user to increase the level of dialog in TV or movie content relative to non-dialog sounds. When only the original mix is available, DE is “unguided,” and requires source separation. In this talk, I will describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. Its technologies include spatio-level filtering (SLF) and deep learning-based dialog classification and denoising; the combination allows for greatly improved performance over systems designed to process voice communications signals. Using subjective listening tests, we show that DeepSpace demonstrates significantly improved overall performance relative to state-of-the-art systems available for testing. We explore the feasibility of using existing automated metrics to evaluate unguided DE systems. Depending on audience interest, additional topics can be covered including (1) repurposing the SLF system to turn any monaural speech denoising system into a perceptually optimized stereo processing system (2) perceptually optimized source suppression and (3) the accessibility impact of Dialog Enhancement on audiences with seniors.
Bio:
Aaron Master received a BSEE from the University of Rochester (NY) in 1999, a B.Mus from the Eastman School of Music in 1999, an M.Phil. in Engineering from the University of Cambridge (UK) in 2000, and a Ph.D. in Electrical Engineering from Stanford University in 2006. He worked as a research engineer and UX director at SoundHound Inc from 2006-2013, where he was a lead inventor of technologies allowing combined query-by-humming and automatic content recognition (ACR), instant-response ACR, automatically synchronized lyrics, and song popularity prediction. Apps he managed received awards from the New York Times, Time Magazine, CNet, Popular Science, and Billboard. Dr. Master served as manager and senior manager of sound technology at Dolby Laboratories from 2013-2022 where he led work on source separation for dialog enhancement; additional research interests include spatial audio and human perception. Dr. Master is first author of over 30 peer-reviewed papers and patents.
See this paper for details: https://arxiv.org/abs/2302.08202
FREE
Open to the Public