I had an absolute blast working with Wekinator. It's probably the most exciting project we've done in this class so far, and I really enjoy this interactive AI design mindset. I can't wait to continue mastering this platform and working with this overall design philosophy. I kept these mini projects short and sweet, and tried to pick slightly different concepts/architectures for each one. I love the idea of quick prototyping different ideas using Wekinator. The one I had the most fun with and spent the most time on was the Wek-Conductor:
This is a gesture-based audio controller that uses the webcam to track the skeletal structure of the hand. I wanted to use a LeapMotion camera, but until I buy one, I have to make do with vanilla webcams to do tracking. I used an outside app called HandPose-OSC by user faaip on github, which wraps MediaPipe Handpose (a TensorFlow pretrained model) inside an Electron app and outputs OSC. This OSC data is sent to Wekinator, where I trained a neural net model taking in all 63 anatomy tracking points and outputting 4 continuous parameters. These four parameters corresponded to:
To me this feels like an incredibly expressive and satisfying live-remixing tool, and it was fun to warp a piece of music that I've spent so much time on writing, producing, and mixing, as it allowed me to be creative at a more macro- level. I definitely plan on developing this interactive system further and using it in my artistic practice.Click here to watch Wek-Conductor in action
I did end up refining my beatbox transformer that I outlined in my last progress update. By switching to dynamic time warping instead of classification, I was able to more accurately and consistently map vocalizations to the drum samples. There was a new issue, that as a side-effect of using dynamic time warping, I was having difficulty consecutively triggering the same sample, since the gestural mapping was built on relative change from the previous state. There was less latency overall though, which made it more usable. However, I realized that even a small amount of latency renders this ineffective, since beatboxing and drumming rely on fine-grained rhythmic placements. That being said, I think it is certainly possible to achieve a functionally acceptable level of latency, but I think it would require a more complicated model and much more fine tuning.
Click here to watch a short screengrab of Beatboxinator using DTW
Download source code
This was the least involved and also first project I did. It is built on FaceOSC using the starter code. I wanted to use gesture recognition with dynamic time warping here too, except here the gestures were visual tracking of my facial movements. I would speak the words "one", "two", "three", "four", and "five" in a variety of volumes, speeds, and contexts (e.g. coming from silence, or immediately after ending another spoken word) . These were simply mapped to scale degrees 1-5 of the major scale, played using a synth oscillator. It doesn't work very well, and I learned that I had to really exaggerate my mouth's movements to get Wekinator to distinguish between each spoken word. It was helpful though, to get acquainted with dynamic time warping as I used it heavily in my other two projects.Click here to watch a short screengrab of FaceCount not doing very well
(Older updates below)
I have been dabbling in a lot of different little areas using Wekinator. I feel there is so much potential for this system and I can't wait to feel more fluent using it so that I can start using it in my music/art. So far, however, I have been having a lot of technical difficulties just making things work smoothly. For this checkpoint I'm demoing one of my projects--a beatbox transformer! I thought this would be wayyy easier than it was. The basic idea is for the user to beatbox/make mouth noises into the mic, thereby triggering different drum samples. Using Wekinator, users can map different sounds to different samples. The basic system I developed works, but the classification/detection of beatboxing noises is not super reliable, and there is considerable delay which makes it difficult to use. I spent the vast majority of my time fine tuning parameters and tweaking things, and trying to learn how to play the instrument I've got (this is a "dumb" instrument that I needed to adapt to and learn how to exploit in order to work). I spent a ton of time just trying to make clear and distinct noises, practicing my performance. The latency is a huge drawback, and I noticed it almost creates that kind of speech-scrambler illusion where slightly delayed audio feedback sort of fries your brain and renders you unable to speak normally.
In any case, the system works, but I feel like I haven't acheived the immediate feedback that would make this fun and easy to use. I also haven't quite found the optimal audio analysis parameters to feed as input into Wekinator. Currently, mic input goes into a feature collector with teh following analyzers: centroid, flux, RMS, 50% rolloff, and 85% rolloff. Each of these undergoes some additional processing to help maximally differentiate the kinds of mouth noises I make when beatboxing. This then gets fed into Wekinator classifier with 3 categories (I had more but it kept getting confused between hi hats, cymbals, and snare). To be honest, even though I could continue to optimize this design to make a more smooth beatboxer, I would rather explore other uses of Wekinator. So, I will probably focus more on another instrument that takes in camera input and modulates some continuous synth parameters rather than outputting classes.
I could not have done these projects without the amazing resources from Kyle McDonald (FaceOSC), faaip (HandPose-OSC), and especially Rebecca Fiebrink (Wekinator) and Ge (ChucK + everything else). I've been excited to do some Interactive AI since reading "Humans in the Loop" last quarter, and I feel like this was a great introduction.