Project #2: K-TILE

Laura Schütz

Phase Three: Make a Musical Mosaic

This project is part of a class assignment on a feature-based audio mosaic tool. The project uses the audio programming language Chuck and the game engine Unity to create a musical statement. An explanation of the different phases of the process can be found below. My project allows for the exploration of the parameter k, a value for similarity retrieval used by the model to generate sound based on an input sound file. The mimalist educational tool lets the user explore the correlation between similarity retrieval and the sound that the model produces. It is an attempt at abstractly visualizing the dimensions of this music model and how the AI synthesizes sound - by taking one sample from the pool of similar samples in n-dimensional feature space and making an audio mosaic from it. The project is called K-TILE, as k decides which tile is picked for the mosaic of sounds.

The choice of visuals and colors was inspired by artworks of Piet Mondrian and the artistic movement De Stijl. De Stijl was influenced by Bauhaus which was a school of design with strong modernist influences. Bauhaus emphasised simplicity and effectiveness as well as modularity in their views on architecture. Like Bauhaus design, the audio mosaic is modular allowing for an abundance of expression through combining ready-made elements in new ways. The choice of sound might be surprising but I though that the juxtaposition of the audio mkaing process - something so highly technological - and the most natrual thing - the song of a bird might be a combination that elicits wonder and joy. It is not an attempt at humanizing AI, but rather a way of saying that not everything AI-related needs to be serious.

Reflection

In Phase 3 I focused on two things. Firstly, I trying to make the sound more responsive to changes of K to create a stronger audio-visual correspondence than in Phase 2. I did multiple trials with different music genres, number of instruments in a song, audio file lengths, combinations of files and changes in parameters. The task proved to be more difficult than expected. When using multiple audio files in the feature extraction task the synthesis would often only pick samples from one file or pick parts of a song that didn't sound very nicely when repeated multiple times in a row. To eliminate additional variability in the final audio output during testing created by a change in the input audio, I decided to use an audio of steady rain as the input file. That helped me to test and see how the similarity of sound samples depending on a change in K influences the final audiovisual experience. This process of trial and error took up most of my time. I lastly experimented with nature sounds and really appreciated the bird songs. Although they might be similar in audio features they are very distinct in melody and distinguishable among a soundscape. Looking at the final result I am happy to have created a simple yet effective tool for explaining the k value within an audio mosaic tool. By working on this project I learned a lot about AI, music analyis and music synthesis. I am also glad that my first encounter with AI is through music as I get to work with it in joyful manner and because all the limitations and considerations when working with such a model become very apparent as every change in code is sonified.

Code and Usage Instruction

The Unity Project including Chuck files can be found here.
By using the W and S key, k can be increased and decreased respectively to change the sound and the visuals.

Acknowledgement

The starter code for this project was provided by Ge Wang, Yikai Li and Andrew Zhu Aday.
The audio is synthesized from a video of 50 species of European birds and their bird songs by Wildlife World.

Phase Two: Designing an Audio Mosaic Tool

In Phase 2 the model was given audio files to extract features from. I experimented with audio that included vocals as well as no vocals. I also tried denser and more silent songs. When using two or more files to extract features from it was hard to achieve an AI-created sound that was pleasant or melodic. After some trials I decided to go with a compilation of handpan songs. The advantage of the handpan or any percussion instrument over any continuous sounds is that the AI can mosaic the sound together in any way and it will still sound quiet natural. When using vocals or continuous sounds the sound mosaic can become more abrupt and choppy. The choppiness could also be an interesting property and a unique characteristic to AI-generated music.

To create the sound, an audio input (microphone or audio file) is analyzed and similarity retrieval based on feature vectors is performed to create a mosaic of sounds. During the process of similarity retrieval the parameter k is used to define how similar the sounds should be that the AI can pick form. A k value of 5 is realatively small, a k value of 50 is large leading to less similar and more random sounds. Being new to AI myself I wanted to visualize the process of picking similar sound samples to help me understand the process. Inspired by artworks of Piet Mondrian and Gerhard Richter I chose a grid of colored tiles to represent the picking of the samples. As k increases the radius around the center tile and the amount of colors increase as well.

Acknowledgement

The starter code for this project was provided by Ge Wang, Yikai Li and Andrew Zhu Aday. The handpan music used to create the sound mosaic for Phase 2 is by Malte Marten.

Phase One: Extract, Classify, Validate

When extracting features from the training dataset it was difficult to find out what exact combination of analyzers would yield the best results. Clearly MFCCs are very important because when using them as the only feature to extract it still resulted in a high accuracy, way higher than when I used all features combined. So clearly quantity alone is not the solution. I went on to tweak the extract time and the number of coefficients in MFCC and that again helped to bring up the accuracy when running the cross validation on it. But I never made it beyond an accuracy of 0.47 in the cross validation for any of the combinations and parameter settings I tried. I ended up with the following combination in my final chuck file for phase 1: centroid, flux, rms, mfcc, rolloff, kurtosis, extract time: 10, num coefficient MFCC: 10. All three neccessary files for extraction, classification, validation can be found here.