Deep Learning for Music Information Retrieval I: How Neural Networks Learn Audio

Workshop Date:

Mon, 07/22/2024 - Fri, 07/26/2024

This workshop offers a fast-paced introduction to audio and music processing with deep learning to bring you up to speed with the state-of-the-art practice in 2024. Participants will learn to build tools to analyze and manipulate digital audio signals with PyTorch. Both theory and practice of digital audio processing will be discussed with hands-on exercises on algorithm implementation. These concepts will be applied to various topics in music information retrieval. Some knowledge of python and strong reasoning skills are assumed.

In-person (CCRMA, Stanford) and online enrollment options available. Students will receive the same teaching materials and have access to the same tutorials in either format. However, students will gain access to more in-depth, hands-on 1:1 instructor discussion and feedback when taking the course in-person.

Schedule

Day 1

Review: Python/pytorch and linear algebra Theory: The Discrete Fourier Transform and Spectral Representations of Audio Hands-on: Linear and logistic regression.

Day 2

Review: Differential calculus. Theory: Softmax and feedforward neural networks. Hands-on: Building a musical instrument classifier in PyTorch.

Day 3

Theory: Convolution and CNNs. Theory: Generative VAEs, KL divergence. Hands-on: Musical tone generation with a pitch-conditioned VAE.

Day 4

Theory: Time-series modeling, RNNs and WaveNet. Theory: Attention mechanisms and transformers. Literature: Contrastive learning and AudioCLIP

Day 5

Hands-on: AudioCLIP for semantic audio understanding. Hands-on: Music synthesis with a Transformer

About the instructors

Iran R. Roman is a faculty member at Queen Mary University London, leading research in theoretical neuroscience and machine perception. He holds a PhD from CCRMA. Iran is a passionate instructor and mentor, with extensive experience teaching AI and signal processing at institutions like Stanford University, New York University, and the National Autonomous University of Mexico. He has worked with companies companies like Plantronics, Apple, Oscilloscape, Tesla, and Raytheon/BBN to build and deploy AI models. iranroman.github.io

Chuyang Chen is a student and research assistant at New York University’s Music and Audio Research Laboratory. With a background in music technology, computer science, and electrical engineering, Chuyang is passionate about building machine listening systems using artificial intelligence, signal processing, and mathematical modeling techniques. His past research topics include beat tracking, music similarity, urban acoustics, and audio-visual analysis.

Search this site:

Spring Quarter 2024

Music 101 Introduction to Creating Electronic Sounds
Music 128 Stanford Laptop Orchestra (SLOrk)
Music 155/255 (ARTSTUDI 239) Intermedia Workshop
Music 220C Research Seminar in Computer-Generated Music
Music 222A Quantum Computer Music
Music 228 SVOrk (Stanford Virtual Reality Orchestra)
Music 250A Physical Interaction Design for Music
Music 254 Computational Music Analysis
Music 257 Neuroplasticity and Musical Gaming
Music 319 Research Seminar on Computational Models of Sound Perception
Music 320C Audio DSP Projects in Faust and C++
Music 423 Graduate Research in Music Technology

Main menu

Secondary menu