Les Atlas (UW) - Better clipping for audio spectrogram DNNs

Date:

Fri, 03/03/2023 - 10:30am - 12:00pm

Location:

CCRMA Seminar Room

Event Type:

Hearing Seminar

Deep Neural Networks (DNNs) are everywhere, and have enabled all sorts of amazing solutions. Speech recognition and translation, all sorts of image applications, and now ChatGPT3 (aka a stochastic parrot that hallucinates).

But audio has always been troublesome with these networks. What the heck do you do with that damn phase? Sometimes you can just throw it away, but if you keep it the phase doesn’t work the way that normal numbers do (like image intensity). And complex numbers aren’t any easier. Networks like TasNet avoid the phase problem by learning multiple overlapping “wavelets”.

I’m happy to welcome Prof. Les Atlas (UW) back to the Hearing Seminar. Les has done lots of creative work in audio signal processing, and wrote one of the first papers on active learning. He’s a DSP theory guy who knows how to apply the theory to practical problems. And fun problems like audio.

Who: Prof. Les Atlas (UW)
What: Better clipping for audio spectrogram DNNs
When: Friday March 3, 2023 at 10:30AM
Where: CCRMA Seminar Room, Top Floor of the Knoll at Stanford
Why: Because DNNs are everywhere, and audio is harder for them

Come to CCRMA and we’ll keep the clipping of your audio signals to a minimum.

- Malcolm

Complex Clipping for Improved Generalization in Machine Learning
Prof. Les Atlas, Electrical and Computer Engineering, University of Washington

Abstract—For many machine learning applications a common input representation is a spectrogram. The underlying representation for a spectrogram is a short time Fourier transform which gives complex values. The spectrogram is the magnitude of these complex values. Modern machine learning systems like deep nets are commonly overparameterized, where possible ill-conditioning problems can be reduced by regularization. The common use of rectified linear unit (ReLU) activation functions between layers of a deep net has been shown to help this regularization, improving generalization performance. We extend this idea of ReLU activation to the complex output of the STFT, providing a simple-to-compute modified and regularized spectrogram, which potentially results in better behaved deep net training. We then confirmed the benefit of this approach on a noisy acoustic data set used for a real-world application, an app to detect COVID-19 from the acoustic signal representing a patient’s cough. Generalization performance improved substantially. This approach might benefit other applications which currently use spectrogram representations. There also might be a relationship between this result and possibly sparse and efficient representations in mammalian audition. But that last point is speculative, and only for after-talk discussions over coffee or beer. (This work is joint with Nicholas Rasmussen, Virufy, Los Altos, CA and CS Department, The University of South Dakota, Felix Schwock, ECE, UW, and Prof. Mert Pilanci, EE, Stanford.)

Biography—After helping Stanford Profs. Robert White and F. Blair Simmons devise the first speech processor for multichannel cochlear implants, Les Atlas became a faculty member at the University of Washington. He started by doing research work in selective sampling and active learning for machine learning, but few cared about that work back then. So he moved on to signal processing. Decades later, the Cohn, Atlas, Ladner paper on active learning (in J. Machine Learning Research) is his most cited paper. He continues to try to avoid proposing research ideas too early or too late. While not guaranteed, this presentation is hopefully neither. You are welcome to attend and to determine that for yourself.

FREE

Open to the Public

Calendar

Search this site:

Fall Courses at CCRMA

Music 1A Music, Mind, and Human Behavior
Music 101 Introduction to Creating Electronic Sounds
Music 192A Foundations in Sound Recording Technology
Music 201 CCRMA Colloquium
Music 220A Foundations of Computer-Generated Sound
Music 223A Composing Electronic Sound Poetry
Music 256A Music, Computing, and Design I: Software Paradigms for Computer Music
Music 319 Research Seminar on Computational Models of Sound Perception
Music 320 Introduction to Audio Signal Processing
Music 351A Research Seminar in Music Perception and Cognition I
Music 451A Auditory EEG Research I

Main menu

Secondary menu

Les Atlas (UW) - Better clipping for audio spectrogram DNNs

Search this site:

Fall Courses at CCRMA