Jump to Navigation

Main menu

  • Login
Home

Secondary menu

  • [Room Booking]
  • [Wiki]
  • [Webmail]

Deep Learning for Music Information Retrieval I: How Neural Networks Learn Audio

Workshop Date: 
Mon, 08/08/2022 - Fri, 08/12/2022


This workshop will cover the industry-standard methods to develop deep neural network architectures for digital audio. Throughout five immersive days of study, we will cover theoretical, mathematical, and practical principles that deep learning researchers use everyday in the real world. Our schedule will be:

 

Day 1: Learning mechanisms of feedforward neural networks

Math - Linear algebra and differential calculus review. The mathematics of feedforward neural networks.

Theory - How synaptic neuroplasticity inspired the backpropagation algorithm.

Practice a) - Writing a feedforward neural network with backpropagation using numpy.

Practice b) - Automating differentiation with Tensorflow 2.

 

Day 2: Building blocks of deep learning architectures

Math - Activation functions. Norm functions. Convolution. Momentum.

Theory - Recurrent computations. Backpropagation through time. Convolutional layers. Gradient descent. Weight regularization. Best practices for parameter initialization.

Practice a) - Writing a recurrent neural network in Tensorflow 2 and/or PyTorch.
Practice b) - Writing a convolutional neural network in Tensorflow 2 and/or PyTorch.

 

Day 3: Training a state-of-the-art model for digital audio

Math - Probability review. Gaussian distribution. Loss functions. MFCCs.

Theory - Feature embedding. Optimization algorithms. Batch normalization. The Keras library. Parallel GPU training. Tensorboard.

Practice a) - Optimizing a Convolutional Neural Network for music genre identification.

Practice b) - Optimizing a Recurrent Neural Network for automatic speech recognition.

 

Day 4: Audio generation and machine translation

Math - Principal components analysis. Kulback-Leibler divergence. 

Theory - Dilated convolutions. Dimensionality reduction. Variational autoencoders. Self-attention.

Practice a) - Audio generation with Wavenet

Practice b) - Audio generation with MusicVAE

Practice c) - Neural machine translation with the Transformer

 

Day 5: Capstone project

Math - Open for personalized review with the instructors.

Theory - Open for personalized review with the instructors.

Practice - Solve a problem of your choice with deep learning. Get personalized guidance from the instructor. Present your results to the class.

 

Who is this workshop for?

This course has been designed for people who want to gain serious experience using deep neural networks to solve digital audio problems with state-of-the-art performance. It is assumed that participants have previous knowledge of linear algebra, differential calculus, and programming experience with python. Individuals who have previous experience with deep learning will also benefit from this workshop's hands-on approach using both Tensorflow 2 and PyTorch. Previous workshop attendees include engineers and scientists now working at tech companies like Apple, Microsoft, and Amazon, as well as students now pursuing graduate studies in artificial intelligence at prestigious institutions all over the world.

 Prerequisites: 

- Email instructors to confirm eligibility.


Enrollment Options:

In-person (CCRMA, Stanford) and online enrollment options available during registration (see red button above). Students will receive the same teaching materials and have access to the same tutorials in either format. In-person students will gain access to more in-depth, hands-on 1:1 instructor discussion and feedback when taking the course in-person. 

About the instructors:

Camille Noufi is a PhD student and researcher at the Center for Computer Research in Music and Acoustics (CCRMA) at Stanford University. Camille studies machine generation of expressive communication, and acoustic impact of the environment on the voice. Her interdisciplinary research utilizes signal processing (DSP), machine learning (ML) and human-computer-interaction (HCI) in combination with psychology and vocal science. She was a research intern in the Audio Team at Meta Reality Labs in 2020. Before coming to CCRMA, she worked on audio scene analysis and vocal biomarker research at MIT Lincon Laboratory. Her research has been presented at the Interspeech, ISMIR, and ICML conferences. camillenoufi.com

Iran R. Roman is a theoretical neuroscientist and machine listening scientist at New York University’s Music and Audio Research Laboratory. Iran is a passionate instructor, with extensive experience teaching artificial intelligence and deep learning. His industry experience includes deep learning engineering internships at Plantronics in 2017, Apple in 2018 and 2019, Oscilloscape in 2020, and Tesla in 2021. Iran’s research has focused on using deep learning for speech recognition and auditory scene analysis. iranroman.github.io


scholarship opportunity: https://docs.google.com/forms/d/e/1FAIpQLSdL4LWoX5EpYUEp0UMFUhhmgMWOHkd8...

 

  • Home
  • News and Events
    • All Events
      • CCRMA Concerts
      • Colloquium Series
      • DSP Seminars
      • Hearing Seminars
      • Guest Lectures
    • Event Calendar
    • Events Mailing List
    • Recent News
  • Academics
    • Courses
    • Current Year Course Schedule
    • Undergraduate
    • Masters
    • PhD Program
    • Visiting Scholar
    • Visiting Student Researcher
    • Workshops 2022
  • Research
    • Publications
      • Authors
      • Keywords
      • STAN-M
      • Max Mathews Portrait
    • Research Groups
    • Software
  • People
    • Faculty and Staff
    • Students
    • Alumni
    • All Users
  • User Guides
    • New Documentation
    • Booking Events
    • Common Areas
    • Rooms
    • System
  • Resources
    • Planet CCRMA
    • MARL
  • Blogs
  • Opportunities
    • CFPs
  • About
    • The Knoll
      • Renovation
    • Directions
    • Contact

Search this site:

Spring Quarter 2022

Music 101 Introduction to Creating Electronic Sounds
Music 123F Wild Sound Explorers
Music 128 Stanford Laptop Orchestra (SLOrk)
Music 220C Research Seminar in Computer-Generated Music
Music 251 Psychophysics and Music Cognition
Music 254 Computational Music Analysis
Music 257 Neuroplasticity and Musical Gaming
Music 264 Musical Engagement
Music 285 Intermedia Lab
Music 320C Audio DSP Projects in Faust and C++

 

 

 

   

CCRMA
Department of Music
Stanford University
Stanford, CA 94305-8180 USA
tel: (650) 723-4971
fax: (650) 723-8468
info@ccrma.stanford.edu

 
Web Issues: webteam@ccrma

site copyright © 2009 
Stanford University

site design: 
Linnea A. Williams