Critical Response 2

Soohyun Kim


Part 1


"Let’s start by exploring why total automation may not be the endgame for AI we tend to think it is."


In this article and in Music 356 class, we have discussed enough to convince ourselves that fully automated AI music production tools are not the Holy Grail of AI music production. And it has led me to become more interested in designing interactive and assistive AI music production tools for novices to professionals. Especially for novices, I am interested in a semantically controllable automatic mixing system, with which novice songwriters can easily make a record of their songs according to their intention, even if they lack experience, knowledge, and techniques in music production.


While I am now fully aware of the drawbacks of the fully automated AI music production tools, I suddenly realized I myself had not considered if interactive and assistive AI music production tools have any drawbacks, just because the other one, the fully automated one looks worse.


Will interactive and assistive AI tools not have any negative effect possible on human music production? When it becomes available for any beginner songwriter and music producer to wield such tools to get sounds nicely fit their intention, is it always good for the evolution of popular music or music production?


The book [Repeated Takes: A Short History of Recording and its Effects on Music] by Michael Chanan points out that the evolution of popular music has been driven by mutations. And those mutations were the happy failures of young aspiring musicians to imitate the sounds of their favorite established musicians (which was what they wanted and what their intention was) due to their lack of techniques and experience. These mutations were especially prominent in the 90s and the early 2000s when young alternative rock bands or rappers with no professional music training or education came out with new sounds.


These happy failures gave the young aspiring musicians chances to find their own new style, as they encounter a perpendicular direction from their original intention (to imitate their favorite musicians). This mutation led to the increase of diversity in popular music, and it was the evolution of popular music.


The intentions of young aspiring musicians are often very superficial because they are not ready to derive a new and creative intention. In most of the cases, they just want to imitate the sounds of their favorite musicians or records. If interactive and assistive tools become always available for young aspiring musicians to get sounds nicely fit such intention, I think they will miss chances to encounter new sounds and find their own new style. This can reduce the occurrence of the happy failures, or the mutations, which will lead the diversity to collapse and converge into certain narrow trend ("Every new musician sounds the same to other established musicians!"). Thereby this can eventually hinder the evolution, the increase of diversity in music production.


I think even when we design interactive and assistive tools for aspiring musicians, we have to leave some room or chances for them to encounter perpendicular directions to their original intention. Just like 'temperature' knobs in many generative models, it will be good to have leave random elements in the system which is good to explore.


It is now my another design goal for interactive and assistive tools for aspiring musicians.



Part 2


  1. Animated dancing videos for babies are mega huge content on Youtube, but those are unidirectional content in which only babies react to those pre-rendered videos. If we can make animated dancing videos using a generative model which react to the gesture of babies (the dancing of animation characters or some background elements change according to the baby's dance), it will make the content more interesting.


  1. When you are in a band, you can easily see singers or guitarists who can't play the drum but show air-drumming with their arms to explain what they want from their drummers ("I want you to play the fill-in like this much dynamic."). If we make a drum loop generator plugin on DAW using a generative model which generates drum loops with desired complexity according to the air-drumming of the user (it doesn't have to be a flawless air-drumming. The system detects the amount of the complexity from the user's movement.), it will be helpful to (novice) music producers who can't play the drum.


  1. Sometimes you see people looking into their phone with huge smile while text messaging (and they themselves don't even realize it). It is sad that you see that huge smile but the one who receives the text message will never know it. If we make an emotion detection system which detects emotion from the user's face expression and changes the color or some visual effects of the text box according to the emotion, it will be interesting. (But you may want to turn off this feature in most cases.)


  1. Similar to 3, we can also imagine predictive text feature using an emotion detection system which recommends words or phrase according to the emotion of the user's face expression.


  1. The AI text editing tool that can understand the context and tone of the user's writing so that it recommends or even generates fonts according to the context and tone.


  1. AI accompaniment system which generate a real-time backing track reacting to the solo instrument player. (My last project was a basic version of this system.)


  1. Semantical parameter recommendation for digital audio effects; when the user want to make the sound "warm", the system generates and suggests two or more parameter presets for the user. The user chooses the best one among them, and the user's choice feeds back as a new training data to train the system again. The system thereby gets better at understanding the user's idea of "warm" sound.


  1. Audio sample synthesizer using generative models but with semantically controllable knobs (e.g. brightness, warmness, room size, etc.) using latent vector interpolation.


  1. Color palettes generation or recommendation system; the user can express the tone or emotion of colors they want by their face expression and gesture, and the system detects the emotion to generate or recommend color palettes according to the emotion.


  1. Text prompting image generation but the users can draw a circle or any closed contour to mark a specific part of the generated image and add a new comment so that the system can edit the image according to the new comment.