NeuralNote: An Audio-to-MIDI Plugin Using Machine Learning
Abstract: NeuralNote is an open-source audio-to-MIDI VST/AU plugin that uses machine learning for accurate audio-to-MIDI transcription. This talk will begin with an in-depth look at BasicPitch, the machine learning model from Spotify that powers NeuralNote. We will explore its internal workings and how it processes audio to generate MIDI data. Next, we will cover the integration of BasicPitch into the NeuralNote plugin, implemented in C++ using the JUCE framework. We will discuss the challenges of incorporating neural network inference in audio plugins, focusing on real-time processing, thread safety, and performance. A comparison of the ONNXRuntime and RTNeural libraries will highlight the options for neural network integration in this domain. Finally, we will outline the architecture of NeuralNote, detailing how its components—user interface, parameters, neural network, synthesizer, audio player—work together across different threads. This talk aims to provide a clear understanding of the development and integration of machine learning in audio plugins.
Bio: Damien Ronssin is a DSP and software engineer from France, currently working at Minimal Audio, where he develops audio plugins such as Current and MorphEQ. He is also the lead developer of the open-source audio-to-MIDI plugin NeuralNote. Prior to his current role, he worked at Logitech on real-time speech processing using machine learning, publishing research on voice conversion and speaker extraction. He holds a master’s degree in Computational Science and a bachelor’s degree in Mechanical Engineering from EPFL (Lausanne, Switzerland).
Zoom: Email jos at CCRMA by 3 PM on May 28 to obtain the Zoom link (users at CCRMA will receive it automatically).