Jump to Navigation

Main menu

  • Login
Home

Secondary menu

  • [Room Booking]
  • [Wiki]
  • [Webmail]

Prateek Verma - Fourier Transforms and Filter-Banks in the Era of Transformers and GPT

Date: 
Fri, 04/07/2023 - 10:30am - 12:00pm
Location: 
CCRMA Seminar Room
Event Type: 
Hearing Seminar
Is there still room in this age of transformers and ChatGPT for a little bit of knowledge about signal processing and auditory modeling? ChatGPT, AudioLM (https://google-research.github.io/seanet/audiolm/examples/), USM (https://sites.research.google/usm/), and their ilk have taken over the world, using lots of data to solve hard problems, at super human levels. It’s really quite amazing. (I know other groups have done similarly impressive models, but I know the Google examples best. Don’t take this as an endorsement.)

Prateek Verma has done a large number of interesting audio ML experiments, from speech to music and many other problem areas. He’ll be talking about learning a basis for the front end.

Who: Prateek Verma
What: Fourier Transforms and Filter-Banks in the Era of Transformers and GPT
When: Friday April 7 at 10:30AM
Where: CCRMA Seminar Room (top floor of the Knoll at Stanford)
Why: Usually a little bit of knowledge goes a long way. Is that still true?

See you at CCRMA. Bring your favorite auditory front end.

- Malcolm

Prateek Verma
Fourier Transforms and Filter-Banks in the Era of Transformers and GPT

Abstract:
Transformers have revolutionized the field of artificial intelligence by propelling powerful self-supervised architectures such as GPT and, recently, Chat-GPT. They are solving and advancing multiple areas and advancing the state-of-the-art in almost every problem thrown at them.

This talk will re-imagine Fourier transforms in this age of Transformers/GPT. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural architectures for various music and audio related research. With convolutional architectures supporting various applications such as ASR and audio understanding, a shift to learnable front ends occurred in which both the type of basis functions and the weight are learned from scratch and optimized for the particular task of interest e.g. raw CLDNN. With the shift to transformer-based architectures with no convolutional blocks present, a linear layer projects small waveform patches onto a small latent dimension before feeding them to a transformer architecture.

What can be the next evolution in this series? Can we learn a better time-frequency representation according to the constraints we provide — By making these front-end transforms entirely learnable according to a task ? Additionally, we will explore the strengths of Wavelet Transforms together with powerful Transformer Architectures and showcase gains achieved for acoustic understanding tasks. We will see significant improvements in performance with the addition of no extra parameter for audio understanding by incorporating various inductive biases in audio signals. Then, we will tinker and open them up to explore what they learn. We see how they know quite a rich vocabulary of basis functions, all learned from scratch instead of a sinusoidal Fourier basis, discovering all kinds of signal processing properties. This work can potentially impact every audio/signal processing task, taking Fourier transforms as the first step or operating directly on raw waveforms with neural architectures such as Transformers. It will piece together almost three decades of signal processing research, starting with STFT to filter-banks, to CLDNNs acoustic model to the current era of Transformers.


This work is done with Chris Chafe, with the backbone Transformer architecture developed jointly with Jonathan Berger in 2021, all at Stanford University.


Bio:
Prateek Verma is currently a researcher at Stanford University. He has held a research positions at various interdisciplinary groups at Stanford, and has published his research in a variety of conferences and journals at Stanford using his acoustic/music/signal processing background. He did his Masters from Stanford CCRMA, and his AI residency at Google X initiating a new direction for robotics research. Before coming to Stanford, he graduated from IIT Bombay in Electrical Engineering with specialization in Signal Processing. His primary research interest and passion is at the intersection of classic signal processing, acoustics, music/audio/speech processing, AI, music information retrieval, and music understanding/synthesis.

Background reading:
Audio Transformers: Transformers For Large Scale Audio Understanding — Adieu Convolutions
https://arxiv.org/abs/2105.00335

A Content Adaptive Front End For Audio Signal Processing
https://arxiv.org/abs/2303.10446

FREE
Open to the Public
  • Calendar
  • Home
  • News and Events
    • All Events
      • CCRMA Concerts
      • Colloquium Series
      • DSP Seminars
      • Hearing Seminars
      • Guest Lectures
    • Event Calendar
    • Events Mailing List
    • Recent News
  • Academics
    • Courses
    • Current Year Course Schedule
    • Undergraduate
    • Masters
    • PhD Program
    • Visiting Scholar
    • Visiting Student Researcher
    • Workshops 2023
  • Research
    • Publications
      • Authors
      • Keywords
      • STAN-M
      • Max Mathews Portrait
    • Research Groups
    • Software
  • People
    • Faculty and Staff
    • Students
    • Alumni
    • All Users
  • User Guides
    • New Documentation
    • Booking Events
    • Common Areas
    • Rooms
    • System
  • Resources
    • Planet CCRMA
    • MARL
  • Blogs
  • Opportunities
    • CFPs
  • About
    • The Knoll
      • Renovation
    • Directions
    • Contact

Search this site:

Spring Quarter 2023

Music 101 Introduction to Creating Electronic Sounds
Music 128 Stanford Laptop Orchestra (SLOrk)
Music 220C Research Seminar in Computer-Generated Music
Music 250A Physical Interaction Design for Music 
Music 254 Computational Music Analysis
Music 257 Neuroplasticity and Musical Gaming
Music 264 Musical Engagement
Music 319 Research Seminar on Computational Models of Sound Perception
Music 320C Audio DSP Projects in Faust and C++

 

 

 

   

CCRMA
Department of Music
Stanford University
Stanford, CA 94305-8180 USA
tel: (650) 723-4971
fax: (650) 723-8468
info@ccrma.stanford.edu

 
Web Issues: webteam@ccrma

site copyright © 2009 
Stanford University

site design: 
Linnea A. Williams