Jump to Navigation

Main menu

  • Login
Home

Secondary menu

  • [Room Booking]
  • [Wiki]
  • [Webmail]

Les Atlas (UW) - Better clipping for audio spectrogram DNNs

Date: 
Fri, 03/03/2023 - 10:30am - 12:00pm
Location: 
CCRMA Seminar Room
Event Type: 
Hearing Seminar
Deep Neural Networks (DNNs) are everywhere, and have enabled all sorts of amazing solutions. Speech recognition and translation, all sorts of image applications, and now ChatGPT3 (aka a stochastic parrot that hallucinates).

But audio has always been troublesome with these networks. What the heck do you do with that damn phase? Sometimes you can just throw it away, but if you keep it the phase doesn’t work the way that normal numbers do (like image intensity). And complex numbers aren’t any easier. Networks like TasNet avoid the phase problem by learning multiple overlapping “wavelets”.

I’m happy to welcome Prof. Les Atlas (UW) back to the Hearing Seminar. Les has done lots of creative work in audio signal processing, and wrote one of the first papers on active learning. He’s a DSP theory guy who knows how to apply the theory to practical problems. And fun problems like audio.

Who: Prof. Les Atlas (UW)
What: Better clipping for audio spectrogram DNNs
When: Friday March 3, 2023 at 10:30AM
Where: CCRMA Seminar Room, Top Floor of the Knoll at Stanford
Why: Because DNNs are everywhere, and audio is harder for them

Come to CCRMA and we’ll keep the clipping of your audio signals to a minimum.

- Malcolm



Complex Clipping for Improved Generalization in Machine Learning
Prof. Les Atlas, Electrical and Computer Engineering, University of Washington

Abstract—For many machine learning applications a common input representation is a spectrogram. The underlying representation for a spectrogram is a short time Fourier transform which gives complex values. The spectrogram is the magnitude of these complex values. Modern machine learning systems like deep nets are commonly overparameterized, where possible ill-conditioning problems can be reduced by regularization. The common use of rectified linear unit (ReLU) activation functions between layers of a deep net has been shown to help this regularization, improving generalization performance. We extend this idea of ReLU activation to the complex output of the STFT, providing a simple-to-compute modified and regularized spectrogram, which potentially results in better behaved deep net training. We then confirmed the benefit of this approach on a noisy acoustic data set used for a real-world application, an app to detect COVID-19 from the acoustic signal representing a patient’s cough. Generalization performance improved substantially. This approach might benefit other applications which currently use spectrogram representations. There also might be a relationship between this result and possibly sparse and efficient representations in mammalian audition. But that last point is speculative, and only for after-talk discussions over coffee or beer. (This work is joint with Nicholas Rasmussen, Virufy, Los Altos, CA and CS Department, The University of South Dakota, Felix Schwock, ECE, UW, and Prof. Mert Pilanci, EE, Stanford.)

Biography—After helping Stanford Profs. Robert White and F. Blair Simmons devise the first speech processor for multichannel cochlear implants, Les Atlas became a faculty member at the University of Washington. He started by doing research work in selective sampling and active learning for machine learning, but few cared about that work back then. So he moved on to signal processing. Decades later, the Cohn, Atlas, Ladner paper on active learning (in J. Machine Learning Research) is his most cited paper. He continues to try to avoid proposing research ideas too early or too late. While not guaranteed, this presentation is hopefully neither. You are welcome to attend and to determine that for yourself.

FREE
Open to the Public
  • Calendar
  • Home
  • News and Events
    • All Events
      • CCRMA Concerts
      • Colloquium Series
      • DSP Seminars
      • Hearing Seminars
      • Guest Lectures
    • Event Calendar
    • Events Mailing List
    • Recent News
  • Academics
    • Courses
    • Current Year Course Schedule
    • Undergraduate
    • Masters
    • PhD Program
    • Visiting Scholar
    • Visiting Student Researcher
    • Workshops 2022
  • Research
    • Publications
      • Authors
      • Keywords
      • STAN-M
      • Max Mathews Portrait
    • Research Groups
    • Software
  • People
    • Faculty and Staff
    • Students
    • Alumni
    • All Users
  • User Guides
    • New Documentation
    • Booking Events
    • Common Areas
    • Rooms
    • System
  • Resources
    • Planet CCRMA
    • MARL
  • Blogs
  • Opportunities
    • CFPs
  • About
    • The Knoll
      • Renovation
    • Directions
    • Contact

Search this site:

Spring Quarter 2023

Music 101 Introduction to Creating Electronic Sounds
Music 128 Stanford Laptop Orchestra (SLOrk)
Music 220C Research Seminar in Computer-Generated Music
Music 250A Physical Interaction Design for Music 
Music 254 Computational Music Analysis
Music 257 Neuroplasticity and Musical Gaming
Music 264 Musical Engagement
Music 319 Research Seminar on Computational Models of Sound Perception
Music 320C Audio DSP Projects in Faust and C++

 

 

 

   

CCRMA
Department of Music
Stanford University
Stanford, CA 94305-8180 USA
tel: (650) 723-4971
fax: (650) 723-8468
info@ccrma.stanford.edu

 
Web Issues: webteam@ccrma

site copyright © 2009 
Stanford University

site design: 
Linnea A. Williams