Muic 356 project #2

Soohyun Kim

 

Phase 1

1. The sample code (20 MFCC coefficients, Centroid, Flux, RMS = 23 dimensions)

(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-23.txt 

# of data points: 1000 dimensions: 23 

fold 0 accuracy: 0.4275

fold 1 accuracy: 0.4338

fold 2 accuracy: 0.4221

fold 3 accuracy: 0.4304

fold 4 accuracy: 0.4608

>> Baseline

 

2. Half hop size (20 MFCC coefficients, Centroid, Flux, RMS = 23 dimensions)

Hope size half

(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-23_half.txt

# of data points: 1000 dimensions: 23 

fold 0 accuracy: 0.3985

fold 1 accuracy: 0.4328

fold 2 accuracy: 0.4343

fold 3 accuracy: 0.4373

fold 4 accuracy: 0.4309

>> Slightly worse than the basline

 

3. Only MFCC (20 MFCC coefficients = 20 dimensions)

(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-20.txt

# of data points: 1000 dimensions: 20 

fold 0 accuracy: 0.4328

fold 1 accuracy: 0.3887

fold 2 accuracy: 0.3819

fold 3 accuracy: 0.4054

fold 4 accuracy: 0.4152

>> Worse than the basline

 

4. Add Kurtosis and SFM together (20 MFCC coefficients, Centroid, Flux, RMS, Kurtosis, SFM = 48 dimensions)

(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-full.txt 

# of data points: 1000 dimensions: 48 

fold 0 accuracy: 0.1078

fold 1 accuracy: 0.0980

fold 2 accuracy: 0.1127

fold 3 accuracy: 0.0931

fold 4 accuracy: 0.0975

>> Disaster

 

5. Add Kurtosis only (20 MFCC coefficients, Centroid, Flux, RMS, Kurtosis = 24 dimensions)

(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-add-Kurtosis.txt

# of data points: 1000 dimensions: 24 

fold 0 accuracy: 0.4333

fold 1 accuracy: 0.4132

fold 2 accuracy: 0.3951

fold 3 accuracy: 0.4392

fold 4 accuracy: 0.4319

>> Slightly worse than the baseline

 

6. Add SFM only (20 MFCC coefficients, Centroid, Flux, RMS, SFM = 47 dimensions)

(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-SFM-MFCC20.txt

# of data points: 1000 dimensions: 47 

fold 0 accuracy: 0.4853

fold 1 accuracy: 0.4814

fold 2 accuracy: 0.4917

fold 3 accuracy: 0.5005

fold 4 accuracy: 0.5113

>> THE BEST !!!!!!

 

7. Add SMF only and 40 MFCC coefficients (40 MFCC coefficients, Centroid, Flux, RMS, SFM = 67 dimensions)

(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-SFM-MFCC40.txt

# of data points: 1000 dimensions: 67 

fold 0 accuracy: 0.4794

fold 1 accuracy: 0.4549

fold 2 accuracy: 0.4431

fold 3 accuracy: 0.5074

fold 4 accuracy: 0.5132

>> Slightly worse than the best

 

8. Add SMF only and 60 MFCC coefficients (60 MFCC coefficients, Centroid, Flux, RMS, SFM = 87 dimensions)

(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-SFM-MFCC60.txt

# of data points: 1000 dimensions: 87 

fold 0 accuracy: 0.4863

fold 1 accuracy: 0.4397

fold 2 accuracy: 0.4775

fold 3 accuracy: 0.4632

fold 4 accuracy: 0.4735

>> Worse than the best

 

9. Add SMF only and 80 MFCC coefficients (80 MFCC coefficients, Centroid, Flux, RMS, SFM = 107 dimensions)

(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-SFM-MFCC80.txt 

# of data points: 1000 dimensions: 107 

fold 0 accuracy: 0.4515

fold 1 accuracy: 0.4721

fold 2 accuracy: 0.4397

fold 3 accuracy: 0.4789

fold 4 accuracy: 0.4613

>> Getting worse as increasing the number of MFCC coefficients

 

What I have learned

- Reducing the hop size (more analysis) does not improve the performance.

- If we only use MFCC, the performance get worse than the baseline (Better with Centroid, Flux, and RMS together).

- By adding SFM, we can get the best performance. But, if we add Kurtosis together with SFM it becomes disaster.

- When SFM added, the performance get worse as increasing the number of MFCC coefficients. 

 

 

 

 

Phase 2

Guitar Duel with Machine

 

Demo Audio (Prototype)

 

 

Typical blues guitar duel. One plays the lead guitar while the other one plays the rhythm guitar. They switch their role later.

 

 

 

 

 

Phase 3

Guitar Duel with Machine

 

Demo Video:

https://youtu.be/MPWlRdHZLJM

 

Reflection:

 

This project is basically to make an AI-guitarist which I can jam with! It imitates the form of typical blues guitar jam of two players (or say, blues guitar duel) where one plays the lead guitar while the other one plays the rhythm guitar, and they switch their role later. So, the system has two modes; the lead guitar mode and rhythm guitar mode where it plays the lead guitar and rhythm guitar respectively while I play the rhythm guitar and lead guitar respectively. And by pressing the spacebar, I can switch the mode of the system.

 

For this project, I made my own guitar sound dataset both for the lead guitar mode and rhythm guitar mode by recording my own guitar playing. Every guitar sample is 1 or 2 measure length for BPM 84. This BPM value is chosen in order to make the real-time similarity retrieval loop work;

 

fft_size + fft_size * (NUM_FRAMES / m) = (4 beats) * (60 / BPM seconds) * (44100 Hz),

where hop_size = fft_size / m.

 

Suddenly I confronted this indeterminate equation like I am solving some practice problem in the elementary number theory class. Thank God for 44100 has many divisors for this equation to hold. BPM = 84 is one of the best divisors to choose as it allows hop_size = fft_size / 2 and NUM_FRAMES = 61. The number 84 was like savior to me for this project as it is also a nice BPM to play the guitar.

 

In order to make the output of the system harmonically correct and musically make sense, I made all audio data, which is recorded guitar sounds, stay in the same harmonic scale. This system is therefore very restricted in terms of musical versatility. I actually considered finding ways to use chroma features for the system to detect the scale of the input, but I did not have enough time to explore that far...

 

To be honest, I think this system is too sensitive in a bad way. The output changes so sensitively to the subtle change in the volume pedal and my picking strength. I tried several weight values for the feature vectors, but still the system remained too sensitive. So It was just me overcoming this bad sensitivity of the system by training my sensitivity to the system's sensitivity. Luckily, I am at least an as well-trained guitarist as I can do so. But if the user is not able to control their velocity of guitar playing subtly, this system would be really difficult to play with.

 

Notwithstanding the limitations of this system, I am happy that I could actually record some nice guitar duet with the system, which musically make sense. I worried a lot if it would not work as I expected... haha.

 

Acknowledgement: I started from the sample code mosaic-synth-osc-kb.ck (v1.3) by Ge Wang and Yikai Li.