Deep Learning for Music Information Retrieval I
This workshop offers a fast-paced introduction to audio and music processing with deep learning to bring you up to speed with the state-of-the-art practice in 2024. Participants will learn to build tools to analyze and manipulate digital audio signals with PyTorch. Both theory and practice of digital audio processing will be discussed with hands-on exercises on algorithm implementation. These concepts will be applied to various topics in music information retrieval. Some knowledge of python and strong reasoning skills are assumed.
In-person (CCRMA, Stanford) and online enrollment options available. Students will receive the same teaching materials and have access to the same tutorials in either format. However, students will gain access to more in-depth, hands-on 1:1 instructor discussion and feedback when taking the course in-person.
Schedule
Day 1
Review: Python/pytorch and linear algebra
Theory: The Discrete Fourier Transform and Spectral Representations of Audio
Hands-on: Linear and logistic regression.
Day 2
Review: Differential calculus.
Theory: Softmax and feedforward neural networks.
Hands-on: Building a musical instrument classifier in PyTorch.
Day 3
Theory: Convolution and CNNs.
Theory: Generative VAEs, KL divergence.
Hands-on: Musical tone generation with a pitch-conditioned VAE.
Day 4
Theory: Time-series modeling, RNNs and WaveNet.
Theory: Attention mechanisms and transformers.
Literature: Contrastive learning and AudioCLIP
Day 5
Hands-on: AudioCLIP for semantic audio understanding.
Hands-on: Music synthesis with a Transformer
About the instructors
Iran R. Roman is a faculty member at Queen Mary University London, leading research in theoretical neuroscience and machine perception. He holds a PhD from CCRMA. Iran is a passionate instructor and mentor, with extensive experience teaching AI and signal processing at institutions like Stanford University, New York University, and the National Autonomous University of Mexico. He has worked with companies companies like Plantronics, Apple, Oscilloscape, Tesla, and Raytheon/BBN to build and deploy AI models. iranroman.github.io
Chuyang Chen is a student and research assistant at New York University’s Music and Audio Research Laboratory. With a background in music technology, computer science, and electrical engineering, Chuyang is passionate about building machine listening systems using artificial intelligence, signal processing, and mathematical modeling techniques. His past research topics include beat tracking, music similarity, urban acoustics, and audio-visual analysis.
Scholarship opportunity: https://forms.gle/sgXhUPdJhskHXFJt8