The sour mash spun out of my interest in better understanding feature representations of music. MIR researchers have found that by incorporating larger and larger feature sets, performance on classification and retrieval tasks have increased. While this is not all that surprising, I've been frustrated by the kitchen-sink approach to analyzing music. Maybe it's the fact that I'm not a pure engineer. Maybe it's because I believe that if we better understand the current limitations of our features sets we will be able to build better features. Either way, I wanted to dig into the the feature representations of music that we know and love.
My initial goal was to create a database of "source songs" that would then be used to re-synthesize an input track in real time. I naively decided to use 25-50 columns, each of which would correspond to a feature of some small sample of audio. Ignoring the difficulty of real-time audio, I began by extracting 20 MFCC (Mel Frequency Cepstral Coefficients) and storing them in a Sqlite database. Then I tried to read in a new track, extract 20 MFCCs over the audio buffer, query the database for the closest match, and pull audio from the source song into the output buffer. This was a miserable failure. Every single aspect of the system was too slow. Like Michelangelo chiseling David out of a 9 foot tall piece of marble, I began paring down the system I started with until it was functional in real time. The final system uses a database of a whopping 3 source songs and 5(!) features. For each track, I store audio buffers of varying sizes so I could experiment with shorter and longer sample sizes. It also visualizes a subset of the database in 3 dimensions to give the user a better idea of what the feature space looks like.
To speed up the database, I implemented a variation of Sqlite that indexes rows through a range query, (typically used for spatial searches.) The 5 features that I selected (rather arbitrarily) were the first 5 MFCCs. Because C++ is the Dale Earnhardt (RIP) of the programming circuit, I had to abandon my beloved Python. I used a JUCE GUI to deal with buttons, sliders, boxes, and even a little bit of OpenGL. I also relied on JUCE's sliders to sidestep the finicky process of parameter selection. In particular, it's hard to find the optimal range to use be searching for points in the DB. If the range is too narrow, you won't find anything. If it's too large, the audio buffer slows down. I demonstrate the software with three source songs (in the database) and three input songs. Ideally, I'd be able to scale everything up real big. But alas, for another day.