Phase One: Extract, Classify, Validate
All possible combinations for all 8 features (centroid, flux, rms, mfcc, rolloff, zerox, chroma, kurtosis) are experimented through cross validation. Average classification accuracy for the 256 combinations of featrues are shown in the figure below (better viewed in zoom in).
From the figure, we can observe that:
Phase Two: Designing an Audio Mosaic Tool
Wanna find out the similarities between PSY (singer of Gangnam Style) and Steve Jobs (Ex-CEO/founder of Apple)?
Features & Instructions:
Phase Three: Make a Musical Mosaic!
Creative statement:
To make this form of interaction more creative, I will polish the prototype more to be a more formal game. It will present a new way of musical mosaic and a new way of human/music interaction in this scenario. To be more specific,
sampling algorithm may be redesigned; there will be difficulty variations if a player score certain points. The entire visual representation will be significantly polished to be as 'artful' as possible.
Features & Instructions:
Reflections:
It is meaningful and fun in general to dive deep into musical mosaic. The major components of this task are two parts: feature extaction and feature matching.
For feature extraction, although I have studied many combinations of features from FFT audio frequencies, it is still very uncertain to claim which feature could be the best to use.
Also, most of the features are specialised for musical audio analysis, which may not be appropriate to describe spoken languages. In this way, I wonder if automatic feature engineering might be a better apporach.
Moreover, for feature matching, although KNN is a common approach for similarity-based feature retrival (with good propertires), it is not tolerant to noise or large variance.
Besides the reflections on the core of musical mosaic, I also acknowledge some limitations of my current implementations. The biggest limitation is its inability to support infinte levels with varying difficulties. In its current implementtaion, three levels are pre-defined with fixed difficulties.
Ideally, users should have access to a custom setting, where they are able to change the difficulties of the game and navigate to any level as he/she wants.
Another limitation is the OSCP5 communication mechanism. there is an approach to transimit information from chuck to processing, but I still cannot find a way to do the same other way around. With this being said, most of information has to be sent from chuck to processing repeatedly in an infinite loop. This is extremely inefficient and should be considered to improve in the future.
In general, I like the concept of creating a music mosaic through KNN-based feature matching to find coherent music pieces from different sources and I enjoy this project a lot. By extracting and analyzing specific features such as tempo, rhythm, pitch, and timbre from various songs, KNN can identify and cluster similar musical segments. This process enables the generation of a music mosaic, where segments from different tracks are seamlessly stitched together based on their similarities, becoming a new representation of music. This technique not only showcases the potential for innovative audio creations but also highlights the power of AI/ML algorithms in understanding and manipulating complex patterns within music. The ability to merge diverse musical elements into a cohesive whole opens up new avenues for creative expression and exploration within the music industry, offering listeners a unique auditory journey through familiar yet distinctly new soundscapes.
Acknowledgements
Both chuck scripts and processing scripts borrow a lot from the sample codes provided in this course. The official processing documentation was of great help. Tiange thanks Ge Wang and Andrew for their help.
Please send any question to Tiange Xiang. Template from here.