Reading Response #5
to Artful Design • Chapter 5: “Interface Design” + Interlude: "Dialogue with a Zen Master"

Sam L.
10.24.21
Music 256A / CS476a, Stanford University


Reading Response: Opening the Black Box

The idea from this week's reading that I will be responding to is Design Principle 5.5:

    Principle 5.5: Have your machine learning AND the human in the loop

Machine learning, and AI more generally, has always been one of the topics in computer science that I find most fascinating - I mean, I did come to Stanford to specialize in AI after all! Funnily enough, though, my journey as an artificial intelligence researcher started at Stanford, and CCRMA specifically, as well. After discovering my love for programming as a freshman at UNC, I came to CCRMA that following summer to attend two of the summer workshops: the first on music information retrieval and the second on extending that toolkit with the power of deep learning.

As I've developed my skills over time and worked on many AI-powered projects, I have come to realize that the study of AI as applied to music, more often than not, simply highlights the shortcomings of the neural network-based approaches that have come to find so much success and popularity since the start of the most recent AI-boom. Any system that comes into contact with real people at any point in the process of its operation needs to be able to refine itself based on the preferences and inputs of the human in the loop. In older, more statistically- and theoretically-grounded approaches to intelligence, this was a more achievable goal, as we understood the mechanisms at play and how to take additional input to modify the output generated by the system. However, we are only now beginning to grasp at the inner workings of neural models, and for the most part, they are still black boxes whose intricacies are unknowable to even their designers. In a sense, they learn to generate highly complex but still static mappings. How can you take such a system and have it adjust on the fly to the whims of its operator?

This is certainly what one could consider an "open question" in the field, but it's an idea that I've been continuously meditating on since my first conversation with Ge on the topic back in 220b. I'm currently of the opinion that such a tightly coupled system is simply not possible with the current toolkit we have to develop artificial neural networks today. Sure, you could take the desired input and output pairing as an additional training point for the system, but if you are training on the scale that many deep networks require, this will not really move the needle, statistically speaking. I believe applications to creativity require a synthesis of modern methods with older ideas in the field, like those from symbolic logic, which can be used to orchestrate the system at a higher level and respond more agilely to the user, while still leveraging the power of neural networks to perform certain "atomic calculations" within the system. The desire to build such systems is exactly what led me to take a class on design in the first place, so my thinking is liable to change in the near future :)