1. The sample code (20 MFCC coefficients, Centroid, Flux, RMS = 23 dimensions)
(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-23.txt
# of data points: 1000 dimensions: 23
fold 0 accuracy: 0.4275
fold 1 accuracy: 0.4338
fold 2 accuracy: 0.4221
fold 3 accuracy: 0.4304
fold 4 accuracy: 0.4608
>> Baseline
2. Half hop size (20 MFCC coefficients, Centroid, Flux, RMS = 23 dimensions)
Hope size half
(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-23_half.txt
# of data points: 1000 dimensions: 23
fold 0 accuracy: 0.3985
fold 1 accuracy: 0.4328
fold 2 accuracy: 0.4343
fold 3 accuracy: 0.4373
fold 4 accuracy: 0.4309
>> Slightly worse than the basline
3. Only MFCC (20 MFCC coefficients = 20 dimensions)
(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-20.txt
# of data points: 1000 dimensions: 20
fold 0 accuracy: 0.4328
fold 1 accuracy: 0.3887
fold 2 accuracy: 0.3819
fold 3 accuracy: 0.4054
fold 4 accuracy: 0.4152
>> Worse than the basline
4. Add Kurtosis and SFM together (20 MFCC coefficients, Centroid, Flux, RMS, Kurtosis, SFM = 48 dimensions)
(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-full.txt
# of data points: 1000 dimensions: 48
fold 0 accuracy: 0.1078
fold 1 accuracy: 0.0980
fold 2 accuracy: 0.1127
fold 3 accuracy: 0.0931
fold 4 accuracy: 0.0975
>> Disaster
5. Add Kurtosis only (20 MFCC coefficients, Centroid, Flux, RMS, Kurtosis = 24 dimensions)
(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-add-Kurtosis.txt
# of data points: 1000 dimensions: 24
fold 0 accuracy: 0.4333
fold 1 accuracy: 0.4132
fold 2 accuracy: 0.3951
fold 3 accuracy: 0.4392
fold 4 accuracy: 0.4319
>> Slightly worse than the baseline
6. Add SFM only (20 MFCC coefficients, Centroid, Flux, RMS, SFM = 47 dimensions)
(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-SFM-MFCC20.txt
# of data points: 1000 dimensions: 47
fold 0 accuracy: 0.4853
fold 1 accuracy: 0.4814
fold 2 accuracy: 0.4917
fold 3 accuracy: 0.5005
fold 4 accuracy: 0.5113
>> THE BEST !!!!!!
7. Add SMF only and 40 MFCC coefficients (40 MFCC coefficients, Centroid, Flux, RMS, SFM = 67 dimensions)
(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-SFM-MFCC40.txt
# of data points: 1000 dimensions: 67
fold 0 accuracy: 0.4794
fold 1 accuracy: 0.4549
fold 2 accuracy: 0.4431
fold 3 accuracy: 0.5074
fold 4 accuracy: 0.5132
>> Slightly worse than the best
8. Add SMF only and 60 MFCC coefficients (60 MFCC coefficients, Centroid, Flux, RMS, SFM = 87 dimensions)
(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-SFM-MFCC60.txt
# of data points: 1000 dimensions: 87
fold 0 accuracy: 0.4863
fold 1 accuracy: 0.4397
fold 2 accuracy: 0.4775
fold 3 accuracy: 0.4632
fold 4 accuracy: 0.4735
>> Worse than the best
9. Add SMF only and 80 MFCC coefficients (80 MFCC coefficients, Centroid, Flux, RMS, SFM = 107 dimensions)
(base) soohyun@DN0a1e4a7e hw2 % chuck x-validate.ck:model-SFM-MFCC80.txt
# of data points: 1000 dimensions: 107
fold 0 accuracy: 0.4515
fold 1 accuracy: 0.4721
fold 2 accuracy: 0.4397
fold 3 accuracy: 0.4789
fold 4 accuracy: 0.4613
>> Getting worse as increasing the number of MFCC coefficients
- Reducing the hop size (more analysis) does not improve the performance.
- If we only use MFCC, the performance get worse than the baseline (Better with Centroid, Flux, and RMS together).
- By adding SFM, we can get the best performance. But, if we add Kurtosis together with SFM it becomes disaster.
- When SFM added, the performance get worse as increasing the number of MFCC coefficients.
Demo Audio (Prototype)
Typical blues guitar duel. One plays the lead guitar while the other one plays the rhythm guitar. They switch their role later.
Demo Video:
Reflection:
This project is basically to make an AI-guitarist which I can jam with! It imitates the form of typical blues guitar jam of two players (or say, blues guitar duel) where one plays the lead guitar while the other one plays the rhythm guitar, and they switch their role later. So, the system has two modes; the lead guitar mode and rhythm guitar mode where it plays the lead guitar and rhythm guitar respectively while I play the rhythm guitar and lead guitar respectively. And by pressing the spacebar, I can switch the mode of the system.
For this project, I made my own guitar sound dataset both for the lead guitar mode and rhythm guitar mode by recording my own guitar playing. Every guitar sample is 1 or 2 measure length for BPM 84. This BPM value is chosen in order to make the real-time similarity retrieval loop work;
fft_size + fft_size * (NUM_FRAMES / m) = (4 beats) * (60 / BPM seconds) * (44100 Hz),
where hop_size = fft_size / m.
Suddenly I confronted this indeterminate equation like I am solving some practice problem in the elementary number theory class. Thank God for 44100 has many divisors for this equation to hold. BPM = 84 is one of the best divisors to choose as it allows hop_size = fft_size / 2 and NUM_FRAMES = 61. The number 84 was like savior to me for this project as it is also a nice BPM to play the guitar.
In order to make the output of the system harmonically correct and musically make sense, I made all audio data, which is recorded guitar sounds, stay in the same harmonic scale. This system is therefore very restricted in terms of musical versatility. I actually considered finding ways to use chroma features for the system to detect the scale of the input, but I did not have enough time to explore that far...
To be honest, I think this system is too sensitive in a bad way. The output changes so sensitively to the subtle change in the volume pedal and my picking strength. I tried several weight values for the feature vectors, but still the system remained too sensitive. So It was just me overcoming this bad sensitivity of the system by training my sensitivity to the system's sensitivity. Luckily, I am at least an as well-trained guitarist as I can do so. But if the user is not able to control their velocity of guitar playing subtly, this system would be really difficult to play with.
Notwithstanding the limitations of this system, I am happy that I could actually record some nice guitar duet with the system, which musically make sense. I worried a lot if it would not work as I expected... haha.
Acknowledgement: I started from the sample code mosaic-synth-osc-kb.ck (v1.3) by Ge Wang and Yikai Li.