Using cross-validation, the best evaluation result I can achieve is between 0.4 and 0.5. I first tried the whether or not the fft size has an effect on its accuracy - changing from 4096 to 2048 to 1024 has no noticeable difference.
1) Example as it is with 1024 sample size
# of data points: 1000 dimensions: 23
fold 0 accuracy: 0.4554
fold 1 accuracy: 0.3907
fold 2 accuracy: 0.4319
fold 3 accuracy: 0.4152
fold 4 accuracy: 0.4412
Before experimenting, I assume chroma with its pitch detection will provide a clear reference for classified. I do see a small differences - all five folds accuracy are above 0.4
2) Add Chroma
# of data points: 1000 dimensions: 35
fold 0 accuracy: 0.4039
fold 1 accuracy: 0.4304
fold 2 accuracy: 0.4760
fold 3 accuracy: 0.4225
fold 4 accuracy: 0.4191
Decreasing mfcc size from 20 to 5 sees a decrease in accuracy.
3)Reduce mfcc size to 5
# of data points: 1000 dimensions: 20
fold 0 accuracy: 0.3897
fold 1 accuracy: 0.4074
fold 2 accuracy: 0.3627
fold 3 accuracy: 0.3848
fold 4 accuracy: 0.4230
4)Add Kurtosis
DO NOT USE Kurtosis! trust me.
The Phase Two prototype is an experiment in generating musical mosaics based on the sonic quality and meditative rhythm of ambient soundscapes. It is an interactive tool that provides four soundscapes for users: Farm, Desert, Rain, and Crowds, each representing a distinct location with a vision to create complementary cinematic shots that emphasize the emotional experience. It has the ability to switch between soundscape options in real-time to generate musical mosaics with keyboard control. Users can also mix the balance between the soundscape and the generated music to their desired level.
1-5 knm control
6-9 mix control: 6 - only soundscape; 9 - only musical mosaics
Q,W,E,R will switch between four available sounscape
---
Phase 2 Source Code Download---
1-5: Knob control
Left and right arrow keys control the mix
Q, W, E, R will switch between four available soundscapes
Q - Redwood Basin National Park
W - Lop Nur Desert
E - Sunset
R - A resting dog on the Mongolian Grassland
Phase 3 Source Code Download
Phase Three is built upon the Phase Two prototype, offering an improved visual presentation of the vector space and mix rate, accompanied by cinematic visuals. The milestone feedback was really helpful esepcially with the suggestion of more control in granularity and instead of pringting out numbers and values, ceating a visual reference of where we are in the vector space. So this is want I did.
Working on this project felt significantly different from creating a non-interactive poetic musical piece. This was partly because utilizing GOFAI work2rev in poem creation was a struggle, making me feel as though I was writing with a handicap. However, making music with Chuck was enjoyable – it opened a new door for me.
I felt deeply involved in creating this project. It was no longer just about inputting a word, crossing my fingers, and hoping the AI model would produce a sensible output. Every aspect, from music source selection and database creation to designing extraction sizes, felt like a part of a comprehensive designing process. Initially, I hadn't approached this project from the perspective of designing an instrument. However, this realization provided a clear guideline for my approach, directing my input to generate sound and music based on a cohesive vision.