Deep Learning for MIR II: State-of-the-art Algorithms
A survey of cutting-edge research in MIR using Deep Learning presented by instructors and a lineup of guest speakers leading research in industry and academia. This workshop is meant for individuals who want to gain experience applying Deep Learning to solve a problem of their interest in MIR. Instructors will explain and demonstrate concepts in models that are used in cutting-edge industry and academic research. Students will build and train state-of-the-art models using PyTorch and GPU computing, adapting them to a MIR-related problem of their interest. Instructors will serve as advisors to students in the course on-demand.
In-person (CCRMA, Stanford) and online enrollment options available during registration (see red button above). Students will receive the same teaching materials and have access to the same tutorials in either format. In-person students will gain access to more in-depth, hands-on 1:1 instructor discussion and feedback when taking the course in-person.
Theory includes: Generative models. Self-supervised feature learning. Attention mechanisms.
Models covered includes: TCN, WaveNet, Transformer, Rave, Crepe, GPT-4, etc.
Practice: music and speech recognition/synthesis, beat-tracking, music-recommendation, and semantic analysis.
Prerequisites:
- Deep Learning for MIR I
About the instructors:
Bea Steers is a research engineer working at MARL (the Music & Audio Research Lab) and CUSP (Center of Urban Science & Progress) doing a mix of data streaming and collection infrastructure and training machine learning for real-time data pipelines. She works on both urban street flooding and AI-driven task assistance utilizing AR devices.
Iran R. Roman is a theoretical neuroscientist and machine listening scientist at New York University’s Music and Audio Research Laboratory. Iran is a passionate instructor, with extensive experience teaching artificial intelligence and deep learning. His industry experience includes deep learning engineering internships at Plantronics in 2017, Apple in 2018 and 2019, Oscilloscape in 2020, and Tesla in 2021. Iran’s research has focused on using deep learning for speech recognition and auditory scene analysis. iranroman.github.io
IMPORTANT: Contact the instructor to register for this week only. Attach a copy of your diploma for the CCRMA Deep Learning for MIR I workshop.
Scholarship opportunity:
https://docs.google.com/forms/d/e/1FAIpQLSdL4LWoX5EpYUEp0UMFUhhmgMWOHkd8VlF70G9BK8e3-AfX2w/viewform