Your Web Page

Phase 1

In phase one, I experimented with a few different feature combinations for feature extraction and cross-validation. In particular, I first tried removing features from the classifier, and found that this substantially hurts performance, as expected. For instance, a genre classifier trained on only RMS was performing around 0.15 which is just above random. Next, I tried tuning various hyperparameters to see if I could tune the classifier to improve beyond default performance. The two different settings I tried for reducing the coefficients and filters of MFCC coefficients to 5 and 5, respectively, only hurt the performance, but not by a large amount. I suspect that in lowering these numbers, the model was underfitting the data, especially in the fewer coefficients case. Because of this, I reverted to the default parameters, but tried adding features such as Chroma. This seemed to boost performance to 0.46. Lastly, I tried adding RollOff (cutoff at 0.85) to the feature set as well. With both RollOff and Chroma, I was able to get the cross validation up to around 0.48.

Default Configuration: RMS + 20 Coeffs / 10 Filters MFCC + Centroid + Flux
- fold 0 accuracy: 0.4343
- fold 1 accuracy: 0.4417
- fold 2 accuracy: 0.4284
- fold 3 accuracy: 0.4353
- fold 4 accuracy: 0.4103
RMS Only
- fold 0 accuracy: 0.1877
- fold 1 accuracy: 0.2088
- fold 2 accuracy: 0.1794
- fold 3 accuracy: 0.1966
- fold 4 accuracy: 0.2020
RMS + 5 Coeffs / 10 Filters MFCC + Centroid + Flux
- fold 0 accuracy: 0.3941
- fold 1 accuracy: 0.3843
- fold 2 accuracy: 0.3765
- fold 3 accuracy: 0.4211
- fold 4 accuracy: 0.4348
RMS + 20 Coeffs / 5 Filters MFCC + Centroid + Flux
- fold 0 accuracy: 0.3696
- fold 1 accuracy: 0.3784
- fold 2 accuracy: 0.3779
- fold 3 accuracy: 0.3946
- fold 4 accuracy: 0.4328
RMS + 20 Coeffs / 10 Filters MFCC + Centroid + Flux + Chroma
- fold 0 accuracy: 0.4627
- fold 1 accuracy: 0.4485
- fold 2 accuracy: 0.4127
- fold 3 accuracy: 0.4412
- fold 4 accuracy: 0.3853
RMS + 20 Coeffs / 10 Filters MFCC + Centroid + Flux + Chroma + Rolloff (Threshold of 85)
- fold 0 accuracy: 0.4436
- fold 1 accuracy: 0.4260
- fold 2 accuracy: 0.4206
- fold 3 accuracy: 0.4377
- fold 4 accuracy: 0.4863

Phase 2 Explorations

My goal in this project is to have an interactive guitar jam sesh with myself with the help of Chuck. To do this, I tried curating a set of audio files that are recordings of me playing the electric guitar. I didn't really play any specific songs, mostly just chords and I tried to solo a little bit even though I'm not great :).

The goal is to have a system where I can start playing something, and Chuck can perform kNN retrieval on what I'm playing and playback the most similar snippet that I recorded. I wanted the system to mimic a jam session with a friend, where one person is usually playing chords and the other is solo'ing, and you take turns listening to what the other person is playing and trying to complement their sound.

Using the features that worked well for me in Phase 1, I recorded 10 ~20 second snippets of myself playing guitar, and trained the mosaic extractor on this. Then, I experimented a bit with mosaic-synth-mic.ck, seeing what the default behavior was when I just started playing chords. One things I didn't quite like was how the synthesized snippets were so short and very sensitive to every little sound I would make. I also felt like the layering of multiple nearest sounds made it difficult to hear melodies. I ended up changing a number of parameters, specifically the window of input frames (NUM_FRAMES), the window of synthesis, and I changed the number of nearest neighbors to 1 (so that it would only retrieve 1 similar snippet). This made it easier for me to listen and play along with Chuck. I also added timing delays so that it would be more of a back-and-forth, where I'd play and Chuck would listen, and vice versa.

Here's a demo of the current system:

Things that didn't work:

Originally, I tried downloading guitar audio snippets for random artists I like, such as Khruangbin. I wanted to see if I could start playing guitar, and have Chuck act like 'Shazam' and retrieve the song I was playing.
It didn't end up working well because I think the audio characteristics of my recordings versus the ones I downloaded from YouTube were too different, and the features I used were not expressive enough to figure out which snippets were most similar to mine.

Code

My code for phase 2/3 is available here. I first run "chuck --silent mosiac-extract:input2.txt:OUTPUT" to extract features from my guitar playing. Then, I run "chuck perform.ck:OUTPUT"

.

Phase 3 Sketch

For phase 3, something I want to improve is the adding variety to the synthesized sounds Chuck produces. Currently, it only performs retrieval and playback, but ultimately it would be cool for Chuck to be able to "solo" too by slightly modulating the synthesized snippets in some way.

For the final deliverable, I want to be able to show a jam sesh between myself and Chuck that sounds more smooth and natural. Right now, I feel like the retrieval aspect is promising but I still find the transitions between different snippets to be a bit choppy.

I also want to continue playing around with the way I do feature extraction, and see if it is possible to create a decent sounding jam session using recordings from other artists who I like. Ideally, being able to do retrieval on both my own recordings and other artists would be a much more interesting performance than just this current prototype.

Phase 3 Reflection

My original vision going into this project was to have Chuck act as a guitarist and mimic being able to have a jam session together. To do this, I recorded myself playing guitar in snippets, ran feature extraction, and then used a Chuck program to listen to my live guitar and then have it retrieve the nearest snippets and play them back.

Originally, I wanted it to be much more like a leader-follower setup where I would play, and Chuck would play very similar things back to me, but I found the task of mapping the pitch of my live guitar to similar sounds in feature space to be very difficult. I experimented with different settings for the feature extraction, and ended up using my best combination from phase 1 (with RollOff & Chroma). I also tried recording more snippets of myself playing guitar, to see if this would give the program more to draw from when doing feature matching, as well as adding some silent recordings. Despite these changes, I still couldn't quite manage to get the mapping to be one-to-one like how I envisioned it. One thing that I found did help a lot was recording all my snippets in the same key; when I tried recording just truly random strumming and notes, the synthesized sound windows were very discordant. I also originally wanted to be able to record guitar snippets from other artists I liked and train on those, but I found that training on sounds from my own guitar helped a lot with retrieval.

I feel that even the project didn't exactly turn out how I envisioned it, I still found it to be a really fun interactive performance. I could feel myself listening more to Chuck and trying to match what I was playing to what I was hearing, rather than just playing what I wanted in the hopes that Chuck would follow. Given more time, I would love to experiment more with going a step further and really modulating the outputs that Chuck synthesizes instead of just doing retrieval.

Phase 3 Final Demo

Code

My code for phase 3 is available here. I first run "chuck --silent mosiac-extract:input2.txt:OUTPUT" to extract features from my guitar playing. Then, I run "chuck perform.ck:OUTPUT"

.

Acknowledgements

Thanks to ChatGPT for making this HTML template for me :). Also thanks to Ge and Andrew for the starter code.

HW2: Featured Artist | Priya Sundaresan

Phase 1

Phase 2 Explorations

Code

Phase 3 Sketch

Phase 3 Reflection

Phase 3 Final Demo

Code

Acknowledgements