Programming Etude #2: "Featured Artist"

A brief report on Phase 1

Using cross-validation, the best evaluation result I can achieve is between 0.4 and 0.5. I first tried the whether or not the fft size has an effect on its accuracy - changing from 4096 to 2048 to 1024 has no noticeable difference.

1) Example as it is with 1024 sample size

# of data points: 1000 dimensions: 23

fold 0 accuracy: 0.4554

fold 1 accuracy: 0.3907

fold 2 accuracy: 0.4319

fold 3 accuracy: 0.4152

fold 4 accuracy: 0.4412

Before experimenting, I assume chroma with its pitch detection will provide a clear reference for classified. I do see a small differences - all five folds accuracy are above 0.4

2) Add Chroma

# of data points: 1000 dimensions: 35

fold 0 accuracy: 0.4039

fold 1 accuracy: 0.4304

fold 2 accuracy: 0.4760

fold 3 accuracy: 0.4225

fold 4 accuracy: 0.4191

Decreasing mfcc size from 20 to 5 sees a decrease in accuracy.

3)Reduce mfcc size to 5

# of data points: 1000 dimensions: 20

fold 0 accuracy: 0.3897

fold 1 accuracy: 0.4074

fold 2 accuracy: 0.3627

fold 3 accuracy: 0.3848

fold 4 accuracy: 0.4230

4)Add Kurtosis

DO NOT USE Kurtosis! trust me.

Phase 2 - Demo

The Phase Two prototype is an experiment in generating musical mosaics based on the sonic quality and meditative rhythm of ambient soundscapes. It is an interactive tool that provides four soundscapes for users: Farm, Desert, Rain, and Crowds, each representing a distinct location with a vision to create complementary cinematic shots that emphasize the emotional experience. It has the ability to switch between soundscape options in real-time to generate musical mosaics with keyboard control. Users can also mix the balance between the soundscape and the generated music to their desired level.

Keyboard Control

1-5 knm control

6-9 mix control: 6 - only soundscape; 9 - only musical mosaics

Q,W,E,R will switch between four available sounscape

---

Phase 2 Source Code Download

---

Phase 3 - Yeah!!

Keyboard Control

1-5: Knob control

Left and right arrow keys control the mix

Q, W, E, R will switch between four available soundscapes

Q - Redwood Basin National Park

W - Lop Nur Desert

E - Sunset

R - A resting dog on the Mongolian Grassland

Phase 3 Source Code Download

Some Reflection

Phase Three is built upon the Phase Two prototype, offering an improved visual presentation of the vector space and mix rate, accompanied by cinematic visuals. The milestone feedback was really helpful esepcially with the suggestion of more control in granularity and instead of pringting out numbers and values, ceating a visual reference of where we are in the vector space. So this is want I did.

Working on this project felt significantly different from creating a non-interactive poetic musical piece. This was partly because utilizing GOFAI work2rev in poem creation was a struggle, making me feel as though I was writing with a handicap. However, making music with Chuck was enjoyable – it opened a new door for me.

I felt deeply involved in creating this project. It was no longer just about inputting a word, crossing my fingers, and hoping the AI model would produce a sensible output. Every aspect, from music source selection and database creation to designing extraction sizes, felt like a part of a comprehensive designing process. Initially, I hadn't approached this project from the perspective of designing an instrument. However, this realization provided a clear guideline for my approach, directing my input to generate sound and music based on a cohesive vision.