Here are the feature sets that I experimented with!
All default features
Includes centroid, flux, rms, 20-d mfcc
fold 0 accuracy: 0.3657
fold 1 accuracy: 0.4098
fold 2 accuracy: 0.4123
fold 3 accuracy: 0.4186
fold 4 accuracy: 0.3863
13 mfcc
Same as above, but only 13-d mfccs
fold 0 accuracy: 0.4039
fold 1 accuracy: 0.4093
fold 2 accuracy: 0.3833
fold 3 accuracy: 0.3691
fold 4 accuracy: 0.4235
Note: higher accuracy in 13-mfccs could be due to other features being more heavily weighted. In the 20-d version, mfccs have a much greater effect on the proximity of vectors, as they take up a larger proportion of dimensions.
Just add more features!
Adds Kurtosis, zero crossings, and xcorr to default
fold 0 accuracy: 0.4127
fold 1 accuracy: 0.4250
fold 2 accuracy: 0.3765
fold 3 accuracy: 0.4422
fold 4 accuracy: 0.4309
Note: adding these extra features slightly improve the accuracy
Add chroma
Further adds 12 chroma dimensions (for each halfstep)
fold 0 accuracy: 0.4020
fold 1 accuracy: 0.4118
fold 2 accuracy: 0.3819
fold 3 accuracy: 0.3858
fold 4 accuracy: 0.4348
Note: adding chroma did not have an effect. This may indicate that pitch class doesn't help disambiguate genre
Only chroma
Nothing but the chroma dimensions!
fold 0 accuracy: 0.2294
fold 1 accuracy: 0.2255
fold 2 accuracy: 0.2044
fold 3 accuracy: 0.2319
fold 4 accuracy: 0.2569
Note: the fact that this does better than random does perhaps indicate that pitch class may correlate (weakly) with genre. I might look more into the trends and the average distribution for certain genres.
Phase Two: Designing an Audio Mosaic Tool
Database
My data audio is sourced from LSJUMB albums (contact me if you'd like pw access). For this milestone, I've gathered a few of our songs that fall into the "funk" genre, including Flahslight, Stuff Like That, What is Hip, and Whisper Your Name:
Interface
The interface that I developed converts midi keyboard inputs to outputs from audio files that prominently contain that pitch. For example, if one were to press the "U" key down, which corresponds to Bb, the audio output will be fragments of the original songs that are the most "Bb-like."
This is accomplished from a similarity retrieval standpoint using the Chroma object (and only the chroma object, no additional features are currently being used). Then, when the user presses a mapped key, I alter the frequency parameter of a SinOsc input. This means that I'm basically doing nearest neighbor search across the space using a vector that is all zeros except for a 1 in the target pitch. The result is actually quite good, you can pretty clearly make out the pitch.
Next Steps
I would love to build out the following features in phase three:
Better keyboard behavior: right now, the "midi keyboard" idea doesn't actually function like a keyboard. Instead, it remembers the last key you pressed and continually make sound. I'd like to only play a note when a key is pressed
Polyphonics support: right now, one can only have one input signal at a time. I'd like to chuck multiple SinOscs into the fft, and extend the keyboard functionality to allow multiple notes at the same time to create "chords" of sound
More features: I'd like to use more features in my similarity retrieval. Of course, this means that I would not want to use a SinOsc as the input, as its fft is quite boring. I could instead mess with different input signals or experiment with different filters on other Ugens to filter for other musical qualities, such as timbre
Smoother listening experience: Jumping from sample to sample is quite jarring in the current version, especially since the panning is all over the place and my code isn't diligent about syncing up the different layers of sound
Demo Audio (sneak peak!)
Here's some demo audio - I'm using the closest-neighbor setting to show the pitch accuracy, and then towards the end I mess around with increasing the number of neighbors sampled by the program.