Jump to Navigation

Main menu

  • Login
Home

Secondary menu

  • [Room Booking]
  • [Wiki]
  • [Webmail]

Source coding of audio signals with a generative model

Date: 
Fri, 02/07/2020 - 10:30am - 12:00pm
Location: 
CCRMA Seminar Room
Event Type: 
Hearing Seminar
Audio coders have historically taken two approaches: 1) source coders that understand how a signal is generated, often from a vocal tract, and generate representation using a few time-varying quantized parameters, and 2) perceptual coders that capitalize on “flaws” in the perceptual system that drop parts of the signal are not heard.

This week at the CCRMA Hearing Seminar, Roy Fejgin and Cong Zhou from Dolby will talk about a new kind of speech source coder that uses a deep neural network (DNN) to synthesize higher quality speech based upon an encoding generated by conventional source coder. A few years ago, a generative DNN called WaveNet produced the best synthetic speech by “knowing" what speech sounds like and generating new samples, conditioned on parameters, a sample (or block of samples) at a time.

The Dolby work, to be discussed on Friday, combines these two ideas, using a source coder to analyze the speech, and then generating new speech using a Wavenet. The result has higher quality, and perhaps more importantly the bit rate can be changed without retraining.

Who: Roy Fejgin and Cong Zhou (Dolby)
What: Speech coding with a deep generative model
When: Friday February 7th at 10:30AM
Where: CCRMA Seminar Room
Why: DNNs rule the world, but can they code?

Bring your voice and ears to CCRMA and we’ll talk about how to compress them.



Source coding of audio signals with a generative model
Roy Fejgin and Cong Zhou (Dolby)

In recent years, deep generative models like SampleRNN and WaveNet have shown a remarkable ability to generate realistic-sounding speech. One application of these models has been to the problem of low-bitrate speech coding. We will discuss our work “High-Quality Speech Coding with SampleRNN" (published in ICASSP 2019), where we demonstrate that speech can be coded at high quality very low bitrates. We will then present an extension of the approach to additional signal categories, based on our (ICASSP 2020) paper "Source Coding Of Audio Signals with A Generative Model". The strengths and limitations of the proposed scheme will be discussed.

This work was done in collaboration Janusz Klejsa, Lars Villemoes and Per Hedelin of the Dolby Sweden office.



Bios
Roy Fejgin is a Senior Staff Engineer at Dolby Laboratories’ Applied AI team. He received his Master’s degree in Music, Science and Technology from CCRMA in 2010. He got his B.Sc. in Computer Science from the Hebrew University of Jerusalem. At Dolby, Roy has worked on perceptual and lossless audio coding, and in recent years has focused on the application of deep learning to speech and audio coding.

Cong Zhou is a Staff Engineer at Dolby Laboratories’ Applied AI team. He received his Master's degree in Electrical Engineering (Multimedia and Creative Technology) from USC in 2011. Cong’s primary research interests include deep learning, spatial audio capture in virtual reality, and audio and speech processing. His recent work includes voice conversion and speech coding with deep generative models.
FREE
Open to the Public
  • Calendar
  • Home
  • News and Events
    • All Events
      • CCRMA Concerts
      • Colloquium Series
      • DSP Seminars
      • Hearing Seminars
      • Guest Lectures
    • Event Calendar
    • Events Mailing List
    • Recent News
  • Academics
    • Courses
    • Current Year Course Schedule
    • Undergraduate
    • Masters
    • PhD Program
    • Visiting Scholar
    • Visiting Student Researcher
    • Workshops 2022
  • Research
    • Publications
      • Authors
      • Keywords
      • STAN-M
      • Max Mathews Portrait
    • Research Groups
    • Software
  • People
    • Faculty and Staff
    • Students
    • Alumni
    • All Users
  • User Guides
    • New Documentation
    • Booking Events
    • Common Areas
    • Rooms
    • System
  • Resources
    • Planet CCRMA
    • MARL
  • Blogs
  • Opportunities
    • CFPs
  • About
    • The Knoll
      • Renovation
    • Directions
    • Contact

Search this site:

Winter Quarter 2023

101 Introduction to Creating Electronic Sound
158/258D Musical Acoustics
220B Compositional Algorithms, Psychoacoustics, and Computational Music
222 Sound in Space
250C Interaction - Intermedia - Immersion
251 Psychophysics and Music Cognition
253 Symbolic Musical Information
264 Musical Engagement
285 Intermedia Lab
319 Research Seminar on Computational Models of Sound
320B Introduction to Audio Signal Processing Part II: Digital Filters
356 Music and AI
422 Perceptual Audio Coding
451B Neuroscience of Auditory Perception and Music Cognition II: Neural Oscillations

 

 

 

   

CCRMA
Department of Music
Stanford University
Stanford, CA 94305-8180 USA
tel: (650) 723-4971
fax: (650) 723-8468
info@ccrma.stanford.edu

 
Web Issues: webteam@ccrma

site copyright © 2009 
Stanford University

site design: 
Linnea A. Williams