Homework 2
Leyth Toubassy
February 7th, 2023
Music 356 / CS470, Stanford University

Homework 2 - Feature Artists
This project was definitely a learning experience for me. I was inspired by Destiny’s live action trailers, with the awesome Led Zeppelin tracks in the background, and I wanted to bring back some of that feeling into the actual gameplay. I spent hours trying to tune it so that you could actually hear distinct noises in game, and try as I might it felt like nothing would give me the results I wanted. Until of course it very suddenly felt accurate, as I was testing I happened across a set of values that led to really accurate replication of the in game noise and I was honestly blown away by the power of the relatively simple system I had trained. My musical statement isn’t super musical quite honestly, but is instead designed around showing the range of the system in a relatively dynamic way, from single distinct sounds, to a cacophony of explosions and (digital) destruction. I did find that accuracy and my initial vision were pretty diametrically opposed, either my model could accurately replicate the sound effects of the game, or they could more closely resemble the backing tracks that inspired this project as a whole. Originally intended to be two modes of the same script, I have two data sets (trained with the same extractor) that are meant to be run through different chuck scripts. One data set is the original songs, and the other is the trimmed audio from the trailers themselves. I’m going to work on making one script with a toggle, but for now the two scripts will have to do. I used a software called Virtual Audio Cable in order to have Destiny’s output be routed into chuck. I had a ton more footage to showcase, but wanted to keep the video shorter, plus the video editing software I had to use was the worst piece of software I've ever used (but hey it did everything I needed it to do).
I also forgot to mention in class but the synthesis is done live, not on a recording of gameplay!

Reconstruction Data set
I used the parts of these video that are backed by the cool music, you'll know :)
Reconstruction mode is backed by like 4-ish minutes of video.
Zeppelin mode is backed by Trimmed Versions of Black Dog and Immigrant Song.
Milestone 1
For phase 1 I tried each of the following configurations:
Just Chroma - pretty poor, around .2 accuracy - makes sense, only one feature
Usual 23 + Chroma - around .35, I was surprised that it was worse than the performance of the 23 dimesnions default one, I surmise chroma is just less useful for classification.
Just 30 MFCCs - around .33, This was surprising because of just how high it was using only mfcc's
Usual 23 w/o 20 MFCC - around .24, This was super low, clearly mfcc was doing some heavy lifting.
Usual 23 - only 10 MFCC - This was aroud .35 to .4 depending on the fold, and was consistently out-scoring the sample, I was surprised cause I kind of assumed more dimensions = more accuracy but it seems that was wrong.

Part 2
After an unfortunate amount of odd technical errors and compatability issues, I wasn't really able to get the synthesizer working quite how I want it to. The goal is to have an "intensity" slider controlled by the mouse which is trained by hand during the extraction process, but I'm pretty sure the additional dimension is throwing a spanner into the works, as you can kind of see in the video. There is mouse input in that recording, but of course, obs couln't see my mouse for some reason :|. Apologies for the abslutley repugnant recording, it seems chuck just hates existing on my computer.

Run extract in non-silent mode and move the mouse to control the "intensity" slider while training.