256A Reading Response #5
This is a response to a principle in Ge Wang’s book Artful Design: “Principle 5.5 Have your machine learning – And the human in the loop!” (pg. 218)
Hey Robot! Share! Please?
Machine learning is generally structured around “tasks”, but never “tools”. There are countless papers and competitions about which algo or model can classify the emotion of a facial expression, but far fewer on what to do with that. While this feels like a classic case of “we were so preoccupied with if we could, we never stopped to ask if we should”, it gets at something a little deeper I think:
Machine learning is hard!
It makes complete sense that for the first 20 or so years of using it, we didn’t really care too much about what it was doing. There were some early techniques (mainly forest and tree-based methods) that could tell you what was important in the data and, essentially, what the AI was actually learning. Is the algo classifying dogs from wolves? Or is just looking for snow in the background? Recently there are some exciting tools that examine different layers of a neural network to understand what they’re prioritizing, but the combinations and non-linearities are so enormous that it often becomes untenable very quickly.
But it’s this exact “black box” style that could be leveraged in an interesting way, especially for fuzzy phenomena. Things that are hard to define, more of a “I know it when I see it” kind of thing. And there are SO many fuzzy concepts in music (including fuzz pedals!). Example-based methodologies are so exciting for this reason, but also because they have the opportunity to increase access to things that have been historically gate-kept. So it makes so much sense to leverage machine learning as a tool to get at these more abstract notions quickly and easily. Even trained musicians and highly skilled sound designers have woefully coarse language for timbre and texture in music and could benefit from example-based tools.
This is triply exciting because it has the potential to open the door for many folks of varied ability statuses. By making the process of getting approximately correct sounds quickly with only a few parameters of input, nuanced sound-making can be accessible to many more people.
Hey Robot! Share! Please?
Machine learning is generally structured around “tasks”, but never “tools”. There are countless papers and competitions about which algo or model can classify the emotion of a facial expression, but far fewer on what to do with that. While this feels like a classic case of “we were so preoccupied with if we could, we never stopped to ask if we should”, it gets at something a little deeper I think:
Machine learning is hard!
It makes complete sense that for the first 20 or so years of using it, we didn’t really care too much about what it was doing. There were some early techniques (mainly forest and tree-based methods) that could tell you what was important in the data and, essentially, what the AI was actually learning. Is the algo classifying dogs from wolves? Or is just looking for snow in the background? Recently there are some exciting tools that examine different layers of a neural network to understand what they’re prioritizing, but the combinations and non-linearities are so enormous that it often becomes untenable very quickly.
But it’s this exact “black box” style that could be leveraged in an interesting way, especially for fuzzy phenomena. Things that are hard to define, more of a “I know it when I see it” kind of thing. And there are SO many fuzzy concepts in music (including fuzz pedals!). Example-based methodologies are so exciting for this reason, but also because they have the opportunity to increase access to things that have been historically gate-kept. So it makes so much sense to leverage machine learning as a tool to get at these more abstract notions quickly and easily. Even trained musicians and highly skilled sound designers have woefully coarse language for timbre and texture in music and could benefit from example-based tools.
This is triply exciting because it has the potential to open the door for many folks of varied ability statuses. By making the process of getting approximately correct sounds quickly with only a few parameters of input, nuanced sound-making can be accessible to many more people.