Exploring Neural Audio Coding Methods
Date:
Fri, 05/24/2024 - 10:30am - 12:00pm
Location:
CCRMA Seminar Room
Event Type:
Hearing Seminar This week at the CCRMA Hearing Seminar Senyuan Fan and Prof. Marina Bosi claim that these implicit methods require less training data and achieve higher compression rates than other approaches. How do they do that?
Who: Senyuan Fan and Marina Bosi
What: Exploring Neural Audio Coding Methods
When: Friday May 24th at 10:30AM
Where: CCRMA Seminar Room, Top Floor of the Knoll at Stanford
Why: Let’s learn the underlying function, that would be cool
As long as I still have you. A couple of notes about past and coming Hearing Seminars.
The Hearing Seminar is now 35 years old, and I gave a brief retrospective for the CCRMA Open House last week. Here are my slides
https://docs.google.com/presentation/d/1yKNoK2BRDSLrjyLnO70B0Kq5MfsE_VrrgJ2aYmdLQXM/edit?usp=sharing
I have records for 500 seminars over these years. That’s a big number. Thanks to all of you for coming.
And in two weeks, we’ll have an amazing panel on the work done at Stanford to design speech processors for cochlear implants. I’ll send out more details next weekend, but if you want a preview go to this web page
https://ccrma.stanford.edu/events/robert-l-whites-cochlear-implants
In 35 years we’ve covered a bit of everything, from cochlear implants to implicit function learning with DNNs. And I’ve learned a lot from all of you. Thanks for coming.
- Malcolm
Senyuan Fan and Marina Bosi will talk about his work on creating neural representations of audio.
Abstract
Abstract: Audio coding can be implemented through concise neural codes employing end-to-end neural networks. While this method has shown promise in achieving high compression ratios, the reconstructed audio quality frequently suffers. Implicit neural representations (INRs) have demonstrated remarkable capability in modeling various complex signals, spanning from radiance fields and 3D shapes to images, videos, and audio. In contrast to end-to-end neural audio codecs, INRs do not necessitate extensive training data and show notably faster decoding speeds. However, existing research focus on low-sample-rate audio and has sacrificed quality over coding efficiency. In this talk, we will demonstrate methods we used and insights we have while training fully connected multilayer perceptron networks (MLPs) with periodic activation functions. We will show that INRs can proficiently model full-bandwidth audio signals.
Bios
Senyuan's Bio: Senyuan Fan is a second-year MA/MST student at CCRMA. He built his background in audio signal processing and machine learning at University of Rochester, where he earned B.S. and M.S. in Electrical and Computer Engineering. At CCRMA, he focused his research on audio coding methods, advised by Prof Marina Bosi. Senyuan is also a classically trained violinist and enthusiastic participant in chamber and symphonic music.
Marina Bosi, a founding director of the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) and chair of the IEEE Standard Association (SA) Context-based Audio Enhancement (CAE) Working Group, is a pioneer in the field of digital audio coding and has enjoyed a distinguished career as a researcher, leader, and educator in the fields of digital media technology, digital rights management, IP licensing, and AI. A Fellow and Past President of the Audio Engineering Society, Dr. Bosi was chief technology officer of MPEG LA, LLC, Denver, CO, vice president-technology at DTS, Inc., Los Angeles, CA, and was a member of the research team that created Dolby Digital at Dolby Laboratories, San Francisco, CA, where she also led the MPEG-2 AAC development for which she received the ISO/IEC 1997 Editor Award. Marina devoted herself to sharing her hard-won knowledge with the next generation of audio engineers. She launched the first North American university course on perceptual audio coding at Stanford University’s CCRMA where she is currently teaching. Dr. Bosi holds a degree in Physics from the University of Florence, completing her dissertation at IRCAM in Paris, and a degree from the Conservatory of Florence, having later served as a faculty member at the Conservatory of Venice. She's a graduate of Stanford Business School's Executive Program and has held fiduciary positions on several boards. A sought-after keynote speaker, Dr. Bosi holds multiple patents and authored significant contributions to academic literature, including the textbook, Introduction to Digital Audio Coding and Standards (Kluwer/Springer,2002). In recognition of her achievements, Dr. Bosi has received numerous awards, including the AES Silver Medal.
FREE
Open to the Public