Critical Response #1: "What do you (really) want from AI music generation?"

Max Jardetzky
MUSIC 356 (Winter 2023)
⟵ back to my portfolio

Before reading, consider the published examples of the MusicLM research project, as well as its research paper.

When I first encountered the examples given on the MusicLM website, I was profoundly shocked at how well the text prompts came through in the generated audio. My immediate next question was: how dependent is the quality of a model's output on the quality of its training data and labels? I read the MusicLM paper in its entirety, and I kept raising my eyebrows at the assertion that their novel MusicCaps dataset of captioned music clips was expertly labeled and classified. Text descriptions of audio, as mentioned in the paper, provide very sparse decodings; that is, there is almost no way to accurately reproduce a clip of sound just from a boiled-down text description. This is before we even bring in the concept of genre and, more generally, musicking that can't easily be described under popular linguistic convention. The concept of labeling another creator's artistic output by an impersonal, textual description of the surface-level sonic elements really reduces music into its least creative form to me. Music exists to accomplish storytelling, convey the depth of human emotion, and reinforce community. What good are we doing if we take the human out of the loop? I followed the online response to the growth of AI-generated art, and I am seeing similar threads between these conversations. When we apply machine learning to any form of human art, I propose the inevitable existence of a so-called “tragedy of attribution.” The tragedy of attribution results from a process that obfuscates, if not eliminating, the spark of human inspiration, and art without artistic intent causes confusion. I believe this will still prevail even when the output quality of these generative models is beyond the human perceptual line: a musical Turing Test, so to speak.

The advent of AI-generated music challenges something so fundamental to music creation and consumption in modern society: the human element. In my personal experience, the musical projects I am most attached to are legendary songwriters (Porter Robinson, Madeon, Bon Iver, EDEN, and RÜFÜS DU SOL, for example). To me, it won't matter if an AI can generate plausible vocal lines spouting poetic verses if they don't originate from a human artist's life story or artistic universe. If AI-generated music becomes so nuanced and skillful as to become indistinguishable from human practice, I think that many would consider that a sad day indeed. As discussed in THINK 66, there are just some things we should not automate. Among the only promises of AI-assisted musical creation is that less human skills and time investment will be necessary for creative realization. Defendants may say that this levels the playing field of the music practice, but I would like to ask if there was ever a playing field at all. Anybody can create art and music that speaks to their own experience. However, just like learning to cook, there are systems and rules of thumb to consider when architecting a work that will “taste” good to others. The human investment in developing these skills is exactly the artistic struggle, and bypassing that journey with AI models trained on the existing struggles of human artists is almost unconscionable. One definition of art is food for the human desire to conduct introspective and collective exploration of the confusing reality we just found ourselves in one day. Show me an artificial intelligence that has similar origins and metacognitive sensibilities, then maybe I'll listen to one of its songs.

This critical reading response was not written with the assistance of ChatGPT.