Critical Response #1 What do you (really) want from AI music generation?

Aaron H.
Music 356A , Stanford University

Wow, MusicLM is scary, yet I do wonder if this really is a sign that AI jeopardizes the musical process. The man thing that continues to stand out to me when it comes to these models is how the composition of the data sets and the decisions made by the author of the papers changes the output. Even with the enigma of what the dimensionality of the models mean or how the algorithms are learning, I still feel that the ‘artistic’ choices of the engineers are shaping the art that is made. For example, I found it very interesting which voices were produced when listening to the pop and jazz examples. Surprisingly, the voice seemed to stay consistent, almost as if a specific singer that matched the features of the models was curated during synthesis. For one, it made me wonder why the model chose this voice? Questions of copyrigiht also came up as I pondered how AI and music has used before in rap through the AI FN Meka that used the rapper Kyle the Hooligan’s voice. Unfortunately, the owner of AI FN Meka did not properly give credit to Kyle the Hooligan and ghosted him as the AI made millions. It made me wonder if fusing multiple voices in a data set convolutes who owns the rights to the music, especially if their voice is being used. Another ‘artistic’ choice I found interesting from the MusicLM paper is the experts that were tasked with providing the description of the songs. The website does show some of the descriptions used to denote each one of the songs, as well as descriptions of famous paintings to provide further context. However, even with the value of an experts’ input, it made me wonder what experiences we might be leaving out to describe sounds and the essence of music. For example, how would this model change if we had a survey from around the world that had people describe what each of the songs represent? Or if we were concerned about response bias, what if we went on social media like twitter or Instagram and compile response online to train the model? I would imagine the results would be radically different, may varying depending on the day or trends. To me, it seems like there is a lot of room for experimentation that doesn’t necessarily call for musical experts to provide critical labels for the data set. In the future, google may release the model, and maybe then we will see individuals that begin experimenting with this. I do wish however that more effort was taken in this paper to discuss how these choices could have affected results. The last paragraph briefly began touching on issues of copyright and how the model could potentially be tweaked in other ways. However, maybe more intention towards this part of the article could have shed light on directionality of this research or how we might go about thinking critically about applying MusicLM and other models to come.