Etude 3
Wekinate Your World
System 1: Motor Radio
Wanna have some music while riding your motorcycle? Separate your fists to make the music louder. If you want to hear soft folk music, place your fists lower; if you want to hear hard rock music, place your fists higher.
System 2: Hand Timer
Tired of setting the timer through smartphone? Indicate the time length using hand gestures and start the timer using the specific alphabet "O" gesture. You will hear the alarm when the time is up.
System 3: Fist Block
Hard to find temple blocks with various pitches? Use your left fist instead! Tap different knuckles to trigger different pitches and make music using just two hands.
Reflection
Hands are our most flexible body parts, and I have thought about doing a lot of things with them. With the help of Wekinator, three of them were realized. Though I have trained several machine learning or even deep learning models in the past days, this is my first time playing with interactive machine learning and it feels completely different. For traditional machine learning, the dataset is often built by other researchers so that I had almost no control over the fine details of it, but the sample size is huge and the quality is guaranteed. On the other hand, interactive machine learning gives me way more control over the training data and I can add new samples easily based on the performance of the model. However, limited sample size can lead to either overfitting or underfitting, so the quality of the samples is extremely important. Accordingly, when I was training Motor Radio and Fist Block, if the performance gets dramatically worse, instead of adding more training samples, I will delete all the past samples and start over with better recordings.
During this etude, I tried two different types of webcam input – raw pixels and hand landmarks. Since all three systems use hands to control the sound output, the benefit of hand tracking is obvious. For Motor Radio where I used raw pixels as input, the model could not really tell the difference between my face and hands due to their similar color, so I had to keep my face out of the camera and maintain the size of my fists. In contrast, with hand tracking, the background did not matter at all. As for the type of outputs, I experimented on both continuous numbers and discrete classifiers. Clearly, classifiers were easier to train and control compared with float numbers under interactive machine learning.
Source Code