Music and AI: Homework 2 by Andrew T. Lee

Featured Artist

by Andrew Lee

Music 356: Music and AI

Phase 1: Extract, Classify, Validate

Extraction, Classification, and Validation:

Extract | Classify | X-Validate

To test "classify" and "X-validate", please put the following classifier models at the same directory level as the .ck programs.

Classifier Models:

Classifier 1: MFCC (20)

The idea behind this one is simplification. How well can it predict given only mfcc coefficients and nothing else? Surprisingly, it performs decently well compared to the other models below, yielding an average fold accuracy of around 0.35.

Classifier 2: Centroid, Flux, RMS, RollOff, ZeroX, Chroma, Kurtosis, SFM, and MFCC (20)

The idea behind this one is to gather many features. Surely it can classify better when it knows more, right? This turns out to be one of the worst models here! It yields an average fold accuracy of around 0.10, which is basically the same as random guessing. Perhaps the model overtrained on the data given too many features? How could this be?

Classifier 3: Centroid, Flux, RMS, Chroma

This model uses features that I personally find intuititve to determining a musical genre, such as the "brightness" of sound, its "timbre", and how sound progresses. With 15 dimensions, this model has an average fold accuracy of 0.29. I'm surprised that it didn't horrendously given such few features.

Classifier 4: ZeroX, Rolloff, Kurtosis, SFM, and MFCC (20)

This model is the opposite of Classifier 3. It contains only features that I don't think will help in determining a musical genre. Disappoinintingly, it did do poorly, with an average fold accuracy of 0.10. Perhaps this is again due to having too many features (it has 47).

Classifier 5: Centroid, Flux, RMS, ROlloff, SFM, and MFCC (20)

This model is my best effort to create high-accuracy classifier. I started off with just Centroid, Flux, and MFCC. And I gradually added more features, testing the results of each new feature to determine whether it should be kept or disregarded. With 48 dimensions, it has an average fold accuray of 0.45. This made me think that having many dimensions may not be a bad thing provided the right features to use.

Phase 2: Audio Mosaic Tool

This mosaic uses music from The Greatest Showman as its database, namely "The Greatest Show", "Tightrope", and "Never Enough". The mosaic allows you to switch between different songs to use through keyboard controls. Additional controls include changing the sound rate, freezing the mosaic to loop on a specific audio window, and playing the original song as is starting at a specific audio window.

The Code:

Phase 2 Directory

Demo Video:

Here's a not-very-good demo of my mosaic tool. Why not very good? I recorded this at night and a sudden sense of embarrassmet--fear from my roommate asking me if I'm okay after I start screaming and make weird breathing noises randomly--hindered my ability to effectively demonstrate the capabilities of this mosaic tool. But anyway, this demo is now pretty out of date. Check out Phase 3 instead!

Phase 3: Music Statement

This is a duet performance between me and my mosaic, except I'm both the one playing the piano and the one playing the mosaic, if you consider the mosaic as an instrument and not a musician. I couldn't think of a cool way to combine the sounds of all three songs together smoothly, so instead this video is really me "re-mixing" each of the songs in the mosaic one by one. Disregrading my hand posture, please enjoy.

Acknowledgments

Code: Acknowledgements to Ge and Yikai for providing basically all code used in this assignment.

Data: Acknowledgements to the source music used for this project: The Greatest Show, Tightrope, and Never Enough.

Reflection

This was definitely a cool project. I started off feeling lost though, because I couldn't think of anything more to do with the mosaic than the code that was already provided. But after last week I got some inspiration and I'm decently satisfied with the mosaic I have now.

I originally couldn't find a way to work with multiple songs together and still make the mosaic work well. While it did take in all the songs, only certain ones would get re-produced when synth-ing. So I decided to make individual knn models within the mosaic. This way I can work with whatever subset of sounds I want. One thing that I wish I added was a control that would let me play sounds from each of these models simultaneously. I wonder what it would sound like, to control 3 songs at once?

I'm not sure how I feel about my phase 3. It was really hard to navigate between the piano and the mosaic. In a way, I was basically playing two instruments at once. I wish that I could find a way to incorporate the sounds from the different song modes more effectively, so that it feels like I'm playing with one mosaic, rather than working with 3 different separate models. Another thing that I wish I did was giving non-piano audio input to the synth program. I suppose I was too embarrassed to make weird noises to it, but I did explore a lot of cool sounds that I could get from the model from the experience. My last concern was balancing between playing the actual songs and generating responsive music. For some parts, I felt like I spent a bit too much time playing the actual songs, and wasn't really showcasing the AI-side part of the model. Nonetheless, what's done is done. I did enjoy the underlying concepts behind this assignment a lot. I wouldn't have realized how much creativity and interaction this type of technology still requires from me beforehand.