Les Atlas (UW) - Better clipping for audio spectrogram DNNs
Date:
Fri, 03/03/2023 - 10:30am - 12:00pm
Location:
CCRMA Seminar Room
Event Type:
Hearing Seminar But audio has always been troublesome with these networks. What the heck do you do with that damn phase? Sometimes you can just throw it away, but if you keep it the phase doesn’t work the way that normal numbers do (like image intensity). And complex numbers aren’t any easier. Networks like TasNet avoid the phase problem by learning multiple overlapping “wavelets”.
I’m happy to welcome Prof. Les Atlas (UW) back to the Hearing Seminar. Les has done lots of creative work in audio signal processing, and wrote one of the first papers on active learning. He’s a DSP theory guy who knows how to apply the theory to practical problems. And fun problems like audio.
Who: Prof. Les Atlas (UW)
What: Better clipping for audio spectrogram DNNs
When: Friday March 3, 2023 at 10:30AM
Where: CCRMA Seminar Room, Top Floor of the Knoll at Stanford
Why: Because DNNs are everywhere, and audio is harder for them
Come to CCRMA and we’ll keep the clipping of your audio signals to a minimum.
- Malcolm
Complex Clipping for Improved Generalization in Machine Learning
Prof. Les Atlas, Electrical and Computer Engineering, University of Washington
Abstract—For many machine learning applications a common input representation is a spectrogram. The underlying representation for a spectrogram is a short time Fourier transform which gives complex values. The spectrogram is the magnitude of these complex values. Modern machine learning systems like deep nets are commonly overparameterized, where possible ill-conditioning problems can be reduced by regularization. The common use of rectified linear unit (ReLU) activation functions between layers of a deep net has been shown to help this regularization, improving generalization performance. We extend this idea of ReLU activation to the complex output of the STFT, providing a simple-to-compute modified and regularized spectrogram, which potentially results in better behaved deep net training. We then confirmed the benefit of this approach on a noisy acoustic data set used for a real-world application, an app to detect COVID-19 from the acoustic signal representing a patient’s cough. Generalization performance improved substantially. This approach might benefit other applications which currently use spectrogram representations. There also might be a relationship between this result and possibly sparse and efficient representations in mammalian audition. But that last point is speculative, and only for after-talk discussions over coffee or beer. (This work is joint with Nicholas Rasmussen, Virufy, Los Altos, CA and CS Department, The University of South Dakota, Felix Schwock, ECE, UW, and Prof. Mert Pilanci, EE, Stanford.)
Biography—After helping Stanford Profs. Robert White and F. Blair Simmons devise the first speech processor for multichannel cochlear implants, Les Atlas became a faculty member at the University of Washington. He started by doing research work in selective sampling and active learning for machine learning, but few cared about that work back then. So he moved on to signal processing. Decades later, the Cohn, Atlas, Ladner paper on active learning (in J. Machine Learning Research) is his most cited paper. He continues to try to avoid proposing research ideas too early or too late. While not guaranteed, this presentation is hopefully neither. You are welcome to attend and to determine that for yourself.
FREE
Open to the Public