From CCRMA Wiki
Revision as of 10:47, 10 December 2009 by Kmontag (Talk | contribs)

Jump to: navigation, search


Kevin Montag's Music 256A Final Project, Fall 2009


Intuition is a program that seeks to make audio production more intuitive. It's focused around transforming existing audio to give it desired perceptual qualities - a particular smoothness, brightness, etc.

The user starts by specifying some songs that she likes, and what she likes about them - the great shimmer to that new Jay-Z single, or the starkness of that old Johnny Cash ballad. Then she specifies the tools she wants to use to achieve that kind of sound, in the form of LADSPA plugins. Finally, she specifies some sonic qualities that she thinks will be relevant when the program is trying to figure out what to do with those plugins when faced with sounds it hasn't seen before.

Intuition then uses machine learning algorithms to deduce a map from qualities of an input sound to parameters for the LADSPA plugins, that will allow it to take arbitrary inputs, and make them sound more like that Johnny Cash song in real time.


Intuition has a relatively minimal UI. The user makes four selections - input features, plugins to use, songs to mimic, and features of those songs to target - and then presses "Go!" The algorithmm does its work to learn a mapping - this is fairly time-consuming - and then opens up JACK ports at the inputs and outputs of the plugin chain that the user has specified. While audio is being fed to the program, the values of control parameters for the plugins are displayed, as well as a happy or sad face to let the user know how well the program is doing at matching her desired features. The program does its best to display all the necessary information without feeling cluttered.

In the future, options will be added to fine-tune the parameters of the machine learning algorithm straight from the interface, but these wouldn't be placed in a prominent spot.

The user will specify collections of sound files to be used as "seeds" for the audio transformations. Each collection will show up as an icon in the main window of the interface, and the user can click on the icon to edit the collection (add and remove sounds from it), or click somewhere else to add a new collection. These collections can be saved and loaded.

The program will be JACK-aware; for each instance of the program, the user will choose a single input to which the transformation will be applied, and a single output to which it will be sent.

The main window will consist of one section containing the available collections, and another section containing the "active" collections. The user drags collection icons in and out of the active space, and then clicks on an icon in the active space to specify the ways in which that collection should be used to affect the sound. When the user clicks on an active collection, I'm envisioning a set of sliders which can be used to say how much each particular audio parameter (shimmer, etc) should be "influenced" by that collection.


Sound qualities will be applied by taking short-time FFTs of the incoming signal, and applying transformations that make each FFT more closely "match" the specified collection with respect to some particular quality of the sound. The matching will be performed using an algorithm that I'll be designing as part of my CS229 final project.


Milestone 1: Get a framework up and running for reading/writing files, extracting features, processing collections of files, etc.

Milestone 2: Implement transformation of input files via convolution with centroids of another collection.

Milestone 3: Get a user interface that displays connection information, allows for editing/creation of genres, displays the user's available LADSPA plugins, and gives feedback about JACK connection information.

Milestone 4: Link the UI with the feature-based synthesis work I'll be doing in 229.

Milestone 5: Polish everything up.