Date:
Thu, 05/19/2022 - 5:30pm - 6:30pm
Location:
CCRMA Classroom [Knoll 217]
Abstract: Transformers have touched many fields of research and music/audio is no different. This talk will present 3 of my papers as case studies on how we can leverage the power of Transformers in representation learning, signal processing, and clustering. First, we discuss how we're able to beat the wildly popular WaveNet architecture, proposed by Google-DeepMind for raw audio synthesis. We also show how we overcame the quadratic constraint of the Transformers by conditioning on context. Secondly, a version of Audio Transformers for large-scale audio understanding, inspired by viT, operating on raw waveforms, is presented. It combines powerful ideas from traditional signal processing, specifically wavelets, on intermediate transformer embeddings to produce state-of-the-art results. Investigating the front-end to see why it does so well, we show that it learns an auditory filter-bank having a time-frequency representation optimized for the task. For the third part, the power of operating on latent-space encodings, and language modeling on continuous audio signals using discrete tokens will be discussed. This will describe how simple unsupervised tasks can give us strong competitive results compared with that of end-to-end supervision. We give an overview of some recent trends in the field, and papers by Google, OpenAI, etc., about current “fashion trends”. It will be fun too! Finally, as time permits, we will discuss our advances in packet-loss concealment for network music performance, and touch upon the power of approaches based purely on representation learning, without any modern neural nets, and building learning-systems of that nature.
This talk was originally given for CS 25 in the Fall of 2021 at Stanford University.
This work was done in collaboration with Prof. Chris Chafe, Prof. Jonathan Berger, and Prof. Julius Smith, all at the Center for Computer Research in Music and Acoustics at Stanford University. Thanks to Stanford’s Institute for Human-Centered AI (HAI) for supporting this work with a generous Google cloud computing grant.
Bio: Prateek Verma is currently working on audio research at Stanford’s Center for Computer Research in Music and Acoustics (CCRMA) collaborating with Prof. Chris Chafe and Prof. Jonathan Berger. He got his masters degree from Stanford CCRMA, and before that, he was at IIT Bombay.