Jump to Navigation

Main menu

  • Login
Home

Secondary menu

  • [Room Booking]
  • [Wiki]
  • [Webmail]

Transformers for Applications in Audio, Speech and Music: From Language Modeling to Understanding to Synthesis

Date: 
Thu, 05/19/2022 - 5:30pm - 6:30pm
Location: 
CCRMA Classroom [Knoll 217]
Event Type: 
DSP Seminar
Abstract: Transformers have touched many fields of research and music/audio is no different. This talk will present 3 of my papers as case studies on how we can leverage the power of Transformers in representation learning, signal processing, and clustering. First, we discuss how we're able to beat the wildly popular WaveNet architecture, proposed by Google-DeepMind for raw audio synthesis. We also show how we overcame the quadratic constraint of the Transformers by conditioning on context. Secondly, a version of Audio Transformers for large-scale audio understanding, inspired by viT, operating on raw waveforms, is presented. It combines powerful ideas from traditional signal processing, specifically wavelets, on intermediate transformer embeddings to produce state-of-the-art results. Investigating the front-end to see why it does so well, we show that it learns an auditory filter-bank having a time-frequency representation optimized for the task. For the third part, the power of operating on latent-space encodings, and language modeling on continuous audio signals using discrete tokens will be discussed. This will describe how simple unsupervised tasks can give us strong competitive results compared with that of end-to-end supervision. We give an overview of some recent trends in the field, and papers by Google, OpenAI, etc., about current “fashion trends”. It will be fun too! Finally, as time permits, we will discuss our advances in packet-loss concealment for network music performance, and touch upon the power of approaches based purely on representation learning, without any modern neural nets, and building learning-systems of that nature.

This talk was originally given for CS 25 in the Fall of 2021 at Stanford University.

This work was done in collaboration with Prof. Chris Chafe, Prof. Jonathan Berger, and Prof. Julius Smith, all at the Center for Computer Research in Music and Acoustics at Stanford University. Thanks to Stanford’s Institute for Human-Centered AI (HAI) for supporting this work with a generous Google cloud computing grant.

Bio: Prateek Verma is currently working on audio research at Stanford’s Center for Computer Research in Music and Acoustics (CCRMA) collaborating with Prof. Chris Chafe and Prof. Jonathan Berger.  He got his masters degree from Stanford CCRMA, and before that, he was at IIT Bombay. 
FREE
For CCRMA Users Only
  • Calendar
  • Home
  • News and Events
    • All Events
      • CCRMA Concerts
      • Colloquium Series
      • DSP Seminars
      • Hearing Seminars
      • Guest Lectures
    • Event Calendar
    • Events Mailing List
    • Recent News
  • Academics
    • Courses
    • Current Year Course Schedule
    • Undergraduate
    • Masters
    • PhD Program
    • Visiting Scholar
    • Visiting Student Researcher
    • Workshops 2022
  • Research
    • Publications
      • Authors
      • Keywords
      • STAN-M
      • Max Mathews Portrait
    • Research Groups
    • Software
  • People
    • Faculty and Staff
    • Students
    • Alumni
    • All Users
  • User Guides
    • New Documentation
    • Booking Events
    • Common Areas
    • Rooms
    • System
  • Resources
    • Planet CCRMA
    • MARL
  • Blogs
  • Opportunities
    • CFPs
  • About
    • The Knoll
      • Renovation
    • Directions
    • Contact

Search this site:

Winter Quarter 2023

101 Introduction to Creating Electronic Sound
158/258D Musical Acoustics
220B Compositional Algorithms, Psychoacoustics, and Computational Music
222 Sound in Space
250C Interaction - Intermedia - Immersion
251 Psychophysics and Music Cognition
253 Symbolic Musical Information
264 Musical Engagement
285 Intermedia Lab
319 Research Seminar on Computational Models of Sound
320B Introduction to Audio Signal Processing Part II: Digital Filters
356 Music and AI
422 Perceptual Audio Coding
451B Neuroscience of Auditory Perception and Music Cognition II: Neural Oscillations

 

 

 

   

CCRMA
Department of Music
Stanford University
Stanford, CA 94305-8180 USA
tel: (650) 723-4971
fax: (650) 723-8468
info@ccrma.stanford.edu

 
Web Issues: webteam@ccrma

site copyright © 2009 
Stanford University

site design: 
Linnea A. Williams