Critical Response 1 - What do I really want from AI musicgen?

By: Dominic DeMarco

What is MusicLM, really?

MusicLM is a really cool toy, and yet I can't shake the feeling that my catagorization of it as a toy is foolishly nearsighted. Right now, trying to predict the future of this technology is sort of like trying to see the path ahead in thick fog. Generally, I'm sure that music generation will only advance with time, but how twisted the path will be is quite mysterious to me.

It's engaging to play with the demo website, listening to the sample examples and contrasting different versions of "ode to joy." Nevertheless, I can't shake the feeling that every output feels very algorithmic. In the "ode to joy" example, I was listening most closely to the background. The melody generation was fine, but things really fall apart when closely listening to the negative space in the music. It could not mimic key styles, such as string quartets or piano solos (although it did ok in the jazz setting), and the only harmonic content was static throughout the sample. Additionally, listening to the examples for a given semantic token set revealed the limits of generation diveristy - all the examples per category featured the same melody and similar harmonies, though the exact soundscape would differ slightly. When listened to side-by-side, these examples make MusicLM feel less expressive and more deterministic.

Given the examples that have publically been released, I feel that MusicLM is an excellent imitator but isn't creative. It can clearly memorize melodies and pick up on timbral features of musical genres, making it capable of generating basic soundscapes, which could be used as a wonderful starting point. Its output is still well behind AI models for image synthesis (music features the time component, which adds significant complexity).

What do I want from MusicLM (or musicgen, more broadly)?

In general, I want AI to help reduce the amount of tedious work that is still performed by humans due some non-trivial complexity. While music composition/generation is a highly creative domain, there could certainly be aspects of it that are tedious. MusicLM could suggest melodies, progressions, structures, instrumentation choices, and much more, even if it struggles to create "new" music of its own. In a sense, it could reduce the barrier to starting a composition, making it somewhat of a learning tool, though it is important to not conflate difficult-to-learn work with tedious work. MusicLM could also be useful in generating stock audio or sound effects. I have in my mind the thought of AI-generated commerical music playing in the background of 2025's hottest new car model and AI-generated soundtracks backing hours of mind-numbing corporate training videos. (This is an aside - corporate training videos is where music goes to die - but it is fitting listening to 30-mins of music that goes nowhere while a cartoon character tells you not to plug random USB sticks into your work computer.)

In addition to reducing tedium, I'd love for AI music generation to interface with common DAWs and have seperatable stems. I think it would be so much easier to leverage its power if musicians could examine different factors of its output and fine-tune accordingly. The notion of AI-assisted DAWs (or any interoperable interface between humans and music AI) seems really cool to me. I'd love for MusicLM, and similar models, to be a new tool in the toolbox for human creators. As long as the AI doesn't outclass our creativity (and at the moment, it cannot do that), there's no real threat to human composers and creators. Let us leverage the incredible throughput of an AI that can endlessly generate sounds from text to seed our creativity.