From CCRMA Wiki
Revision as of 15:51, 16 December 2011 by Dt (Talk | contribs) (experimental design)

Jump to: navigation, search

sour mash

The name is partially derived from the idea of mashing up songs, except the method I use can vary from unintelligibly small to slightly longer and more intelligible snippets which may or may not give it an acerbic flavor. The other reason is that sour mash is one method used to produce the nectar of the gods, whiskey. I also like to think about this synthesis/exploratory process as a "distillation" of sorts.


The sour mash spun out of my desire to better understand feature representations of music. MIR researchers have found that by incorporating larger and larger feature sets, performance on classification and retrieval tasks increases. While this is not all that surprising, I've been frustrated by the kitchen-sink approach to analyzing music. While these may outperform other, more elegant approaches, I believe that if we better understand the current limitations of our features sets we will be able to build better features. This project is an attempt to illuminate what information is actually captured by widely used feature representations of music. And I think I may have stumbled across a pretty cool way to make music.

software design

To speed up the database, I implemented a variation of Sqlite that indexes rows through a range query, (typically used for spatial searches.) The 5 features that I selected (rather arbitrarily) were the first 5 MFCCs. Because C++ is the Dale Earnhardt (RIP) of the programming circuit, I had to abandon my beloved Python. I used a JUCE GUI to deal with buttons, sliders, boxes, and even a little bit of OpenGL. I also relied on JUCE's sliders to sidestep the finicky process of parameter selection. In particular, it's hard to find the optimal range to use be searching for points in the DB. If the range is too narrow, you won't find anything. If it's too large, the audio buffer slows down. I demonstrate the software with three source songs (in the database) and three input songs. Ideally, I'd be able to scale everything up real big. But alas, for another day.

screen shot!

future directions

As I hinted in my demonstration (by sampling Getting Better by the Beatles, Faster by Janelle Monae, and Stronger by the Kanye West,) it needs to be better, faster, stronger. Mostly faster. If I was a little smarter about threading, about overlap adding, and if I got my database up to snuff, this could be much more compelling. I considered getting rid of the database altogether and implementing some sort of tree or multidimensional data structure. It'd also benefit from some sort of dimension reduction algorithm (PCA would've been nice) to be able to incorporate more features without sacrificing speed. From a more experiential standpoint, interaction with the visualizer could allow the user to actively explore the feature space. Some ideas are to provide ways for the user to sonify samples, clusters of samples, or even synthesize new pieces by drawing paths through the feature space. The potential for sprawl is endless.