Exploring Neural Audio Coding Methods

Date:

Fri, 05/24/2024 - 10:30am - 12:00pm

Location:

CCRMA Seminar Room

Event Type:

Hearing Seminar

Deep neural networks are good at doing many things, such as classification, predictions, and regression. But perhaps their most impressive accomplishment is to learn an *implicit* representation of a signal. Instead of just predicting a value, a DNN can learn the underlying function f(x) that generates the data. Given a position request (x) it returns the “true” value of the function, even for places it has not seen.

This week at the CCRMA Hearing Seminar Senyuan Fan and Prof. Marina Bosi claim that these implicit methods require less training data and achieve higher compression rates than other approaches. How do they do that?

Who: Senyuan Fan and Marina Bosi
What: Exploring Neural Audio Coding Methods
When: Friday May 24th at 10:30AM
Where: CCRMA Seminar Room, Top Floor of the Knoll at Stanford
Why: Let’s learn the underlying function, that would be cool

As long as I still have you. A couple of notes about past and coming Hearing Seminars.

The Hearing Seminar is now 35 years old, and I gave a brief retrospective for the CCRMA Open House last week. Here are my slides
https://docs.google.com/presentation/d/1yKNoK2BRDSLrjyLnO70B0Kq5MfsE_VrrgJ2aYmdLQXM/edit?usp=sharing
I have records for 500 seminars over these years. That’s a big number. Thanks to all of you for coming.

And in two weeks, we’ll have an amazing panel on the work done at Stanford to design speech processors for cochlear implants. I’ll send out more details next weekend, but if you want a preview go to this web page
https://ccrma.stanford.edu/events/robert-l-whites-cochlear-implants

In 35 years we’ve covered a bit of everything, from cochlear implants to implicit function learning with DNNs. And I’ve learned a lot from all of you. Thanks for coming.

- Malcolm

Senyuan Fan and Marina Bosi will talk about his work on creating neural representations of audio.

Abstract
Abstract: Audio coding can be implemented through concise neural codes employing end-to-end neural networks. While this method has shown promise in achieving high compression ratios, the reconstructed audio quality frequently suffers. Implicit neural representations (INRs) have demonstrated remarkable capability in modeling various complex signals, spanning from radiance fields and 3D shapes to images, videos, and audio. In contrast to end-to-end neural audio codecs, INRs do not necessitate extensive training data and show notably faster decoding speeds. However, existing research focus on low-sample-rate audio and has sacrificed quality over coding efficiency. In this talk, we will demonstrate methods we used and insights we have while training fully connected multilayer perceptron networks (MLPs) with periodic activation functions. We will show that INRs can proficiently model full-bandwidth audio signals.

Bios
Senyuan's Bio: Senyuan Fan is a second-year MA/MST student at CCRMA. He built his background in audio signal processing and machine learning at University of Rochester, where he earned B.S. and M.S. in Electrical and Computer Engineering. At CCRMA, he focused his research on audio coding methods, advised by Prof Marina Bosi. Senyuan is also a classically trained violinist and enthusiastic participant in chamber and symphonic music.

Marina Bosi, a founding director of the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) and chair of the IEEE Standard Association (SA) Context-based Audio Enhancement (CAE) Working Group, is a pioneer in the field of digital audio coding and has enjoyed a distinguished career as a researcher, leader, and educator in the fields of digital media technology, digital rights management, IP licensing, and AI. A Fellow and Past President of the Audio Engineering Society, Dr. Bosi was chief technology officer of MPEG LA, LLC, Denver, CO, vice president-technology at DTS, Inc., Los Angeles, CA, and was a member of the research team that created Dolby Digital at Dolby Laboratories, San Francisco, CA, where she also led the MPEG-2 AAC development for which she received the ISO/IEC 1997 Editor Award. Marina devoted herself to sharing her hard-won knowledge with the next generation of audio engineers. She launched the first North American university course on perceptual audio coding at Stanford University’s CCRMA where she is currently teaching. Dr. Bosi holds a degree in Physics from the University of Florence, completing her dissertation at IRCAM in Paris, and a degree from the Conservatory of Florence, having later served as a faculty member at the Conservatory of Venice. She's a graduate of Stanford Business School's Executive Program and has held fiduciary positions on several boards. A sought-after keynote speaker, Dr. Bosi holds multiple patents and authored significant contributions to academic literature, including the textbook, Introduction to Digital Audio Coding and Standards (Kluwer/Springer,2002). In recognition of her achievements, Dr. Bosi has received numerous awards, including the AES Silver Medal.

FREE

Open to the Public

Calendar

Search this site:

Spring Quarter 2024

Music 101 Introduction to Creating Electronic Sounds
Music 128 Stanford Laptop Orchestra (SLOrk)
Music 155/255 (ARTSTUDI 239) Intermedia Workshop
Music 220C Research Seminar in Computer-Generated Music
Music 222A Quantum Computer Music
Music 228 SVOrk (Stanford Virtual Reality Orchestra)
Music 250A Physical Interaction Design for Music
Music 254 Computational Music Analysis
Music 257 Neuroplasticity and Musical Gaming
Music 319 Research Seminar on Computational Models of Sound Perception
Music 320C Audio DSP Projects in Faust and C++
Music 423 Graduate Research in Music Technology

Main menu

Secondary menu

Exploring Neural Audio Coding Methods

Search this site:

Spring Quarter 2024