Critical Response #2: "Power to the People / Humans in the Loop"

Max Jardetzky
MUSIC 356 (Winter 2023)
⟵ back to my portfolio

Before reading, consider the following articles:

S. Amershi et. al. 2014. "Power to the People: The Role of Humans in Interactive Machine Learning."
G. Wang. 2019. "Humans in the Loop: The Design of Interactive AI Systems."

Part 1:
I greatly appreciate the critique in the HAI reading that frames some modern AI systems as Big Red Buttons. I have personally experimented with WOMBO's free Dream app that does prompt-based art generation with the click of a button. As an end user whose only job is to type some words and click a button, I feel completely stripped of a certain agency I have with more conventionally used tools, like Photoshop. I like the the field of interactive machine learning is emerging as an answer to Big Red Button AI systems, which I feel will proliferate in coming years. The sad reality is that many AI systems at scale (think Apple's image post-processing algorithms) would lose their effortless convenience with interactivity, let alone the concern of divulging trade secrets. At the same time, I already see principles of interactive machine learning in use today, but perhaps for the wrong purpose. Take the example of Netflix's recommendation system, which gathers user input from thumbs ups and thumbs downs to more effectively recommend content. However, I know that when I contribute to optimizing a system that exists to squeeze as much money out of my time, I as a user am not benefiting from the interactivity. This is why I prefer not to tune the AIs that try to profile me, except to tone down Meta's aggressive Reels recommender: I can click on a profile once and get shown their Reels for weeks after. The algorithms in use in modern social media are really starting to scare me.

I also worry that interactive ML at scale wouldn't perform as well at tasks, as perhaps transparency in AI is inversely correlated with performance. Compare a Naïve Bayes classifier to transformer and see which one is easier to unpack versus which one performs better. This is all well and good when we're talking about artistic applications like with Wekinator, but if we really want to automate the process of driving a car with computer vision, can we afford to have anything but the most opaque networks? I mean, we are yet to crack the neural code and understand the human brain as it is; would we rather lobotomize our creative and cognitive powers to make them intellectually graspable? I'm not proposing a paradigm one way or another; I only want to challenge the implications of taking AI development to either extreme. There are clear benefits and downsides to systems that are more interactive and transparent, as well as systems that are less so. In addition, even if we decide on an approach that is the right balance of ethics and effectiveness, who enforces these principles upon private interests? Should the government involve itself at regulating the cutting edge of machine learning research, just as the FDA regulates experimental gene therapy (CRISPR) clinical trials? Every day, I get closer to wanting some external and not-for-profit oversight on AI development, just to protect humanity from slippery-sloping its way into a more oppressive future. I'm glad this class exists as a drop in the bucket of global ethical consideration needed in AI.

Part 2:

Computer vision system that identifies injury-prone weightlifting form
Computer vision system that identifies pause and hesitation in Rubik’s cube speedsolving trials for improvement
Computer vision system taking drone footage that identifies early signs of crowd crush at large music festivals
Computer vision system that suggests accessories and other clothing items to pair with a certain garment, with links to buy
Computer vision system that aids in early labeling of possible medical emergencies before support arrives to suggest tasks during crucial minutes
Integrated input analyzer AI system in a first-person shooter video game to identify and train mistakes and inaccuracies in movement, crosshair placement, and decision-making
Music information retrieval system that extracts key and chord progressions from a song
Computer vision system that scans buildings for code violations and fire hazards
Speech recognition system for real-time machine translation with user-level feedback (”Is this what you meant to say?”)
Computer vision system to analyze specialty coffee grind size for brew method and desired extraction level against a baseline

This critical reading response was not written with the assistance of ChatGPT.

Critical Response #2: "Power to the People / Humans in the Loop"

Max Jardetzky MUSIC 356 (Winter 2023) ⟵ back to my portfolio

Max Jardetzky
MUSIC 356 (Winter 2023)
⟵ back to my portfolio