Deep Learning for Music Information Retrieval I: How Neural Networks Learn Audio 2021
This workshop will cover the industry-standard methods to develop deep neural network architectures for digital audio. Throughout five immersive days of study, we will cover theoretical, mathematical, and practical principles that deep learning researchers use everyday in the real world. Our schedule will be:
Day 1: Learning mechanisms of feedforward neural networks
Math - Linear algebra and differential calculus review. The mathematics of feedforward neural networks.
Theory - How synaptic neuroplasticity inspired the backpropagation algorithm.
Practice a) - Writing a feedforward neural network with backpropagation using numpy.
Practice b) - Automating differentiation with Tensorflow 2.
Day 2: Building blocks of deep learning architectures
Math - Activation functions. Norm functions. Convolution. Momentum.
Theory - Recurrent computations. Backpropagation through time. Convolutional layers. Gradient descent. Weight regularization. Best practices for parameter initialization.
Practice a) - Writing a recurrent neural network in Tensorflow 2.
Practice b) - Writing a convolutional neural network in Tensorflow 2.
Day 3: Training a state-of-the-art model for digital audio
Math - Probability review. Gaussian distribution. Loss functions. MFCCs.
Theory - Feature embedding. Optimization algorithms. Batch normalization. The Keras library. Parallel GPU training. Tensorboard.
Practice a) - Optimizing a Convolutional Neural Network for music genre identification.
Practice b) - Optimizing a Recurrent Neural Network for automatic speech recognition.
Day 4: Audio generation and machine translation
Math - Principal components analysis. Kulback-Leibler divergence.
Theory - Dilated convolutions. Dimensionality reduction. Variational autoencoders. Self-attention.
Practice a) - Audio generation with Wavenet
Practice b) - Audio generation with MusicVAE
Practice c) - Neural machine translation with the Transformer
Day 5: Capstone project
Math - Open for personalized review with the instructors.
Theory - Open for personalized review with the instructors.
Practice - Solve a problem of your choice with deep learning. Get personalized guidance from the instructor. Present your results to the class.
Who is this workshop for?
This course has been designed for people who want to gain serious experience using deep neural networks to solve digital audio problems with state-of-the-art performance. It is assumed that participants have previous knowledge of linear algebra, differential calculus, and programming experience with python. Individuals who have previous experience with deep learning will also benefit from this workshop's hands-on approach using Tensorflow 2. Previous workshop attendees include engineers and scientists now working at tech companies like Apple, Microsoft, and Amazon, as well as students now pursuing graduate studies in artificial intelligence at prestigious institutions all over the world.
Prerequisites:
- Introduction to Music Information Retrieval (CCRMA workshop or equivalent). Email instructors to confirm eligibility.
About the instructors:
Elena Georgieva Elena is a PhD student and researcher at NYU’s Music and Audio Research Lab (MARL). Before joining MARL, Elena taught sound recording and managed the recording studio at CCRMA, where she completed her masters degree in Music Science in Technology degree in 2019. Elena has expertise in music information retrieval, machine learning, sound recording, and vocals. Elena has presented her work at the ISMIR and ICML conferences, at Stanford and Berkeley, as well as at several tech companies. Elenatheodora.com
Iran R. Roman is a theoretical neuroscientist and machine listening scientist at New York University’s Music and Audio Research Laboratory. Iran is a passionate instructor, with extensive experience teaching artificial intelligence and deep learning. His industry experience includes deep learning engineering internships at Plantronics in 2017, Apple in 2018 and 2019, Oscilloscape in 2020, and Tesla in 2021. Iran’s research has focused on using deep learning for speech recognition and auditory scene analysis. iranroman.github.io