I found Wekinator itself to be fun and approachable to use (my first time dealing with interactive machine learning!). For my milestone (first system), the webcam was a bit finicky to work with though, since the slightest change in camera angle and/or lighting would confuse the model, forcing me to recalibrate the dataset constantly. Later, when I switched to VisionOSC (for my second and third system) and filtered the data (i.e., only including y-coordinates for hand pose), I found training to be much more robust. I think having a better feature extractor (e.g., a small convolutional neural net for images) in between the camera input phase and the Wekinator phase could go a long way in improving performance.
For each system, I tried taking a unique approach. My first system Productivity Booster (relay code) is sort of a satire on the notion of AI-as-optimization, and it uses a trained image classifier to detect whether or not you’re “being productive” or browsing social media. My second system Face 2 PinkTrombone (relay code) was mainly motivated by an interest in PinkTrombone, for which I sought to create a controller for (and what better way to control a vocal tract simulator than with actual vocal tract coordinates!). On Github, I found a port of the PinkTrombone app with OSC control and hooked it up to Wekinator, but unfortunately, the high cost of running a web server locally (how the OSC messaging was set up) combined with face tracking introduced a ton of latency, making the system difficult to interact with and develop. Finally, for my third system Pocket Theremin (relay code) (the required “expressive musical instrument”), I wanted to focus on mapping and performativity. I devised a simple instrument with three controls, each mapped to hand position: loudness (how open your hand is), pitch (height of hand), and vibrato (rotation of hand).
Reading Response
“Every instantiation of a design is an argument or vote for how we want to live”
---
I came into this course without any concrete expectations or objectives. Mainly, I hadn’t taken a CCRMA class in a while and wanted some therapy for my “AI FOMO”. Yet after 8 weeks, perhaps for the first time, I feel… inspired? to contribute to AI research (but not in the traditional sense, or in a way I would have expected).
While working on the three assignments, I often felt frustrated by how clunky the AI tools were to work with. As an amateur composer who’s used to (and enjoys) tuning music details by hand, the randomness and unpredictability of Word2Vec caught me off guard. (Although Word2Vec in particular is an old model, I get the sense that the requirement for real-time musical interaction would still limit the performance of any newer model). Even when I was able to hand-tune my feature extractor for my audio mosaic or my Wekinator model, I did not find the training (and validation) process to be particularly enjoyable.
All this is to say that perhaps for the first time, I had to critically evaluate what I (and maybe similarly-minded musicians) really want from AI in music. And for the most part, my answer doesn’t lie in better performing word embedding models or optimized audio feature extractors. Actually, when it comes to amateur music-making, I’m not sure I really even want or need anything. The probabilistic nature of AI makes it useful for generating countless variations of a musical statement or for quickly filling in parts, which could probably serve a practical purpose (e.g., creating stock music or song demos). But when nothing’s at stake, the joy of composing comes from agency and control, and I don’t think anything should take away from that.
So where does this leave me? Instead, I think AI should be whimsical. It should do crazy things that a human musician might feel ashamed of doing themselves. It shouldn’t seek to entirely replace parts of the music making process, but rather, it should help the composer try something new or see things in a different way. AI is becoming prevalent in our daily lives, and instead of adopting a “go-with-the-flow” mentality, perhaps it’s time to push back against the status quo of AI-as-optimization. Because I fear for a scenario where in trying to optimize something like music, we lose sight of the joy in the process.
Ultimately, I think that whatever one’s thoughts on AI are, they can meaningfully contribute to the discourse on how AI should be used. To conclude this response, I think one potential area for change is what the CS research community considers to be good AI research. Currently, much of the focus is on algorithms and architectures, with work on human-in-the-loop AI systems being relegated to the field HCI instead (so not even “AI research”). My hope is that by showing something exciting and unconventional in the AI research community, other people might find it interesting, so that we may all cast a vote on how we want to live with AI.