Jump to Navigation

Main menu

  • Login
Home

Secondary menu

  • [Room Booking]
  • [Wiki]
  • [Webmail]

Auditory Separation of a Conversation from Background via Attentional Gating

Date: 
Fri, 10/11/2019 - 10:30am - 12:00pm
Location: 
CCRMA Seminar Room
Event Type: 
Hearing Seminar
The latest speech enhancement work has the potential to dramatically change the way we hear the world around us. This new work has dramatically improved the quality and latency of these algorithms, and it has the potential to change the way we hear the world around us, whether we have normal hearing or need assistance.  These new systems build highly sophisticated models of speech, and can pick out the speech signal from the noise. Oh, yes. Lots of training data makes this possible, but that is easy to get by adding noise to clean speech signals.
 
This coming Friday,  Shariq Mobin will present his work at the intersection of neurophysiology and deep neural networks for speech enhancement. He uses a model of cognitive attention to pick the desired speaker from a cacophony of sounds. Very deep, non-linear networks, learn the characteristics of a speech signal. This is paired with a network that understand how speakers differ from each other, and this allows his network to pay attention to one speaker at a time. This is the state of the art in speech enhancement.
 
Who: Shariq Mobin
What: Auditory Separation of a Conversation from Background via Attentional Gating
When: Fri, 10/11/2019  from 0:30am - 12:00pm
Where:  CCRMA Seminar Room
Why: Speech is a seriously interesting signal and we want to make it clearer.
 
Bring your favorite speech perception system to CCRMA, and we’ll talk about tools to give it a better signal.
 
- Malcolm

Auditory Separation of a Conversation from Background via Attentional Gating
Shariq Mobin
 
Abstract: 
We present a model for separating a set of voices out of a sound mixture containing an unknown number of sources. Our Attentional Gating Network (AGN) uses a variable attentional context to specify which speakers in the mixture are of interest. The attentional context is specified by an embedding vector which modifies the processing of a neural network through an additive bias. Individual speaker embeddings are learned to separate a single speaker while superpositions of the individual speaker embeddings are used to separate sets of speakers. We first evaluate AGN on a traditional single speaker separation task and show an improvement of 9% with respect to comparable models. Then, we introduce a new task to separate an arbitrary subset of voices from a mixture of an unknown-sized set of voices, inspired by the human ability to separate a conversation of interest from background chatter at a cafeteria. We show that AGN is the only model capable of solving this task, performing only 7% worse than on the single speaker separation task.
 
Speaker Bio:
Shariq Mobin is the founder of AudioFocus, a startup developing new hearing aid technology using deep learning. He recently obtained his PhD in auditory neuroscience at the Redwood Center for Theoretical Neuroscience under Professor Bruno Olshausen. His research focused on connecting human attention to deep learning models of sound.
 
 
FREE
Open to the Public
  • Calendar
  • Home
  • News and Events
    • All Events
      • CCRMA Concerts
      • Colloquium Series
      • DSP Seminars
      • Hearing Seminars
      • Guest Lectures
    • Event Calendar
    • Events Mailing List
    • Recent News
  • Academics
    • Courses
    • Current Year Course Schedule
    • Undergraduate
    • Masters
    • PhD Program
    • Visiting Scholar
    • Visiting Student Researcher
    • Workshops 2020
  • Research
    • Publications
      • Authors
      • Keywords
      • STAN-M
      • Max Mathews Portrait
    • Research Groups
    • Software
  • People
    • Faculty and Staff
    • Students
    • Alumni
    • All Users
  • User Guides
    • New Documentation
    • Booking Events
    • Common Areas
    • Rooms
    • System
  • Resources
    • Planet CCRMA
    • MARL
  • Blogs
  • Opportunities
    • CFPs
  • About
    • The Knoll
      • Renovation
    • Directions
    • Contact

Search this site:

Winter Quarter 2021

1A Music, Mind, & Human Behavior
101
 
Introduction to Creating Electronic Sound
220B Compositional Algorithms, Psychoacoustics, and Computational Music
222 Sound in Space
250C Interaction - Intermedia - Immersion
251 Psychophysics and Music Cognition
253 Symbolic Musical Information
285 Intermedia Lab
320B Introduction to Audio Signal Processing Part II: Digital Filters
422 Perceptual Audio Coding
451A Basics in Auditory and Music Neuroscience
 

 

 

 

   

CCRMA
Department of Music
Stanford University
Stanford, CA 94305-8180 USA
tel: (650) 723-4971
fax: (650) 723-8468
info@ccrma.stanford.edu

 
Web Issues: webteam@ccrma

site copyright © 2009 
Stanford University

site design: 
Linnea A. Williams