Source coding of audio signals with a generative model

Date:

Fri, 02/07/2020 - 10:30am - 12:00pm

Location:

CCRMA Seminar Room

Event Type:

Hearing Seminar

Audio coders have historically taken two approaches: 1) source coders that understand how a signal is generated, often from a vocal tract, and generate representation using a few time-varying quantized parameters, and 2) perceptual coders that capitalize on “flaws” in the perceptual system that drop parts of the signal are not heard.

This week at the CCRMA Hearing Seminar, Roy Fejgin and Cong Zhou from Dolby will talk about a new kind of speech source coder that uses a deep neural network (DNN) to synthesize higher quality speech based upon an encoding generated by conventional source coder. A few years ago, a generative DNN called WaveNet produced the best synthetic speech by “knowing" what speech sounds like and generating new samples, conditioned on parameters, a sample (or block of samples) at a time.

The Dolby work, to be discussed on Friday, combines these two ideas, using a source coder to analyze the speech, and then generating new speech using a Wavenet. The result has higher quality, and perhaps more importantly the bit rate can be changed without retraining.

Who: Roy Fejgin and Cong Zhou (Dolby)
What: Speech coding with a deep generative model
When: Friday February 7th at 10:30AM
Where: CCRMA Seminar Room
Why: DNNs rule the world, but can they code?

Bring your voice and ears to CCRMA and we’ll talk about how to compress them.

Source coding of audio signals with a generative model
Roy Fejgin and Cong Zhou (Dolby)

In recent years, deep generative models like SampleRNN and WaveNet have shown a remarkable ability to generate realistic-sounding speech. One application of these models has been to the problem of low-bitrate speech coding. We will discuss our work “High-Quality Speech Coding with SampleRNN" (published in ICASSP 2019), where we demonstrate that speech can be coded at high quality very low bitrates. We will then present an extension of the approach to additional signal categories, based on our (ICASSP 2020) paper "Source Coding Of Audio Signals with A Generative Model". The strengths and limitations of the proposed scheme will be discussed.

This work was done in collaboration Janusz Klejsa, Lars Villemoes and Per Hedelin of the Dolby Sweden office.

Bios
Roy Fejgin is a Senior Staff Engineer at Dolby Laboratories’ Applied AI team. He received his Master’s degree in Music, Science and Technology from CCRMA in 2010. He got his B.Sc. in Computer Science from the Hebrew University of Jerusalem. At Dolby, Roy has worked on perceptual and lossless audio coding, and in recent years has focused on the application of deep learning to speech and audio coding.

Cong Zhou is a Staff Engineer at Dolby Laboratories’ Applied AI team. He received his Master's degree in Electrical Engineering (Multimedia and Creative Technology) from USC in 2011. Cong’s primary research interests include deep learning, spatial audio capture in virtual reality, and audio and speech processing. His recent work includes voice conversion and speech coding with deep generative models.

FREE

Open to the Public

Calendar

Search this site:

Spring Quarter 2024

Music 101 Introduction to Creating Electronic Sounds
Music 128 Stanford Laptop Orchestra (SLOrk)
Music 155/255 (ARTSTUDI 239) Intermedia Workshop
Music 220C Research Seminar in Computer-Generated Music
Music 222A Quantum Computer Music
Music 228 SVOrk (Stanford Virtual Reality Orchestra)
Music 250A Physical Interaction Design for Music
Music 254 Computational Music Analysis
Music 257 Neuroplasticity and Musical Gaming
Music 319 Research Seminar on Computational Models of Sound Perception
Music 320C Audio DSP Projects in Faust and C++
Music 423 Graduate Research in Music Technology

Main menu

Secondary menu

Source coding of audio signals with a generative model

Search this site:

Spring Quarter 2024