WeebBlasters Mosaic

I had a blast finishing up this project. It was very nostalgic for me compiling some of my favorite anime openers and closers into a compilation of concatenative synthesis. It was particularly interesting exploring how an anime opener and closer relate to one another and seeing how creating a model from the pairing manifests as a mosaic. Tbh I’m trying to BS this response because I don’t want to spoil the fun of watching the video. I think it would have been lit exploring each pairing and seeing all the sounds that come from just combining one opener and closer.


Mosaic Milestone1

Phase 1

Phase 1 was a bit of a challenge because I wasn't sure exactly what I was looking for in terms of sound and the final output. After reading Perry Cook's paper, it seemed like the best model would depend heavily on the input and the desired output. So, I tried a couple things, first varying the MFCC numbers and seeing how that faired with the classifier. I confirmed that 5 MFCCs seemed adequate to get decent accuracy for the model. In my feature extract 21 I found that I was able to get pretty high resolutions with just 20 MFCCs and the centroid:

fold 0 accuracy: 0.4015

fold 1 accuracy: 0.4054

fold 2 accuracy: 0.3985

fold 3 accuracy: 0.4211

fold 4 accuracy: 0.4049

I also found an example where I avoided MFCC all together and only used centroid, flux, and sfm. I'm really curious to see how the output would defer with this model:

fold 0 accuracy: 0.3877

fold 1 accuracy: 0.3838

fold 2 accuracy: 0.3775

fold 3 accuracy: 0.3412

fold 4 accuracy: 0.3760

For some reason, zero crossing really messed my models up. Whenever I included it as a feature, my accuracy dropped drastically which I found surprising. Because of how it was described in the paper, I thought it might be able to distinguish between pieces that have quitter parts or silences, but not sure how that worked. I also experimented with changing the weights of the features and seeing that had any effect on the classifier. It did so maybe it’ll have an affect on the audio itself.

Eventually I ended up going with a 30-feature model that I ended up adding chroma, kurtosis, rolloff, and 13 MFCCs as mentioned in Perry Cook’s paper and got the following accuracy:

fold 0 accuracy: 0.4711

fold 1 accuracy: 0.4392

fold 2 accuracy: 0.4559

fold 3 accuracy: 0.4912

fold 4 accuracy: 0.4706

This was the highest accuracy I was able to achieve, and I do like the sounds that came from it.


For phase 2, I ended up compiling some of my favorite anime openers and closers to be trained I the model. The idea was to have some sort of fusions between the opening and ending song to see what outputs I could get. I eventually tried to play with the weights of the features as well, but not really sure if it changes much. I was able to get some of the vidoes to play at the same time in chunity, but I have some ideas about shuffling the videos around, playing some iconic anime lines to drive to data, and maybe focusing on one anime opener and closer at a time somehow.


Unravel - Tokyo Ghoul


Kaikai Kitan - Jujutsu Kaisen


Battlecry - Samuri Champloo


Tank - Cowboy Bebop


Colors - Code Geass


My War- Attack on Titan


Nothings Carved in Stone Out of Control - Psycho Pass


Seishun Kyousoukyoku - Naruto


Crazy Noisy Bizarre Town - Jojo


Demons Butterfly -Devilman Crybaby Rap


This Fffire - Cyberpunk: Edgerunners


Asterisk - Bleach


Wind - Naruto


Lost in Paradise -Jujutsu Kaisen


Monster without a name - Psycho-pass


I really want to stay at your house- Cyberpunk: Edgerunners




Devilman Rap - Devilman Crybaby; Soundcloud- ghost609


Saihate - Bleach


Shiki No Uta - Samurai Champloo


The Real Folk Blues -Cowboy Bebop


Great Escape - AOT


Kisetsu wa Tsugitsugi Shinde Iku - TG


Yuujyou Seishunnka - code gears


Phase 3 Sketch


Thanks to Ge, Nick, Alex, Andrew, and Yikai for all there help to make this work