Source coding of audio signals with a generative model
Date:
Fri, 02/07/2020 - 10:30am - 12:00pm
Location:
CCRMA Seminar Room
Event Type:
Hearing Seminar This week at the CCRMA Hearing Seminar, Roy Fejgin and Cong Zhou from Dolby will talk about a new kind of speech source coder that uses a deep neural network (DNN) to synthesize higher quality speech based upon an encoding generated by conventional source coder. A few years ago, a generative DNN called WaveNet produced the best synthetic speech by “knowing" what speech sounds like and generating new samples, conditioned on parameters, a sample (or block of samples) at a time.
The Dolby work, to be discussed on Friday, combines these two ideas, using a source coder to analyze the speech, and then generating new speech using a Wavenet. The result has higher quality, and perhaps more importantly the bit rate can be changed without retraining.
Who: Roy Fejgin and Cong Zhou (Dolby)
What: Speech coding with a deep generative model
When: Friday February 7th at 10:30AM
Where: CCRMA Seminar Room
Why: DNNs rule the world, but can they code?
Bring your voice and ears to CCRMA and we’ll talk about how to compress them.
Source coding of audio signals with a generative model
Roy Fejgin and Cong Zhou (Dolby)
In recent years, deep generative models like SampleRNN and WaveNet have shown a remarkable ability to generate realistic-sounding speech. One application of these models has been to the problem of low-bitrate speech coding. We will discuss our work “High-Quality Speech Coding with SampleRNN" (published in ICASSP 2019), where we demonstrate that speech can be coded at high quality very low bitrates. We will then present an extension of the approach to additional signal categories, based on our (ICASSP 2020) paper "Source Coding Of Audio Signals with A Generative Model". The strengths and limitations of the proposed scheme will be discussed.
This work was done in collaboration Janusz Klejsa, Lars Villemoes and Per Hedelin of the Dolby Sweden office.
Bios
Roy Fejgin is a Senior Staff Engineer at Dolby Laboratories’ Applied AI team. He received his Master’s degree in Music, Science and Technology from CCRMA in 2010. He got his B.Sc. in Computer Science from the Hebrew University of Jerusalem. At Dolby, Roy has worked on perceptual and lossless audio coding, and in recent years has focused on the application of deep learning to speech and audio coding.
Cong Zhou is a Staff Engineer at Dolby Laboratories’ Applied AI team. He received his Master's degree in Electrical Engineering (Multimedia and Creative Technology) from USC in 2011. Cong’s primary research interests include deep learning, spatial audio capture in virtual reality, and audio and speech processing. His recent work includes voice conversion and speech coding with deep generative models.
FREE
Open to the Public