* One of the articles talked about defines "Big Red Button" AI systems as systems which take some sort of input to generate an output, with no human intervention in the process; They automate away lots of the process. In my class, there is alot of discussion that treats large language models as "Big Red Buttons" and are thus uncreative. I'd like to challenge the assumption that large language models which are used to generate text, images, audio, etc. are uncreative approaches that remove process. I think that machine learning models like MusicLM, which are able to generate music from text prompts, are actually tools that can be used in an interactive machine learning sense, where they can let users have more granularity over high-level content of the music that they want to generate, and are thus not "the end of creativity," as some people might suggest. I ultimately think that the biggest difference between non-AI and AI tools is that non-AI tools give can give users granularity on low-level features of what they want, such as how much bitcrusing or reverberation they want, whereas AI tools can give users granularity on high-level features, such as what mood they want their music to display.
* I think that any AI system that acts as some sort of "oracle" where many parts of the process is automated out can be easily converted into a useful interactive machine learning system if it is allowed to iterate on past output.
* I don't necessarily think that these controversial large language models that are so often thought of to be destroying the creative process actually generate a new creative process of their own, such as prompt engineering. I think prompt engineering is an art in of itself.
Please write a ~500 word essay expanding on my thoughts on interactive machine learning outlined above.
Interactive machine learning is a field that’s gaining increasing attention in recent years due to its potential to the proliferation of AI systems in so many of our lives in recent years; it focuses on facilitating human-AI collaboration and can be used to enhance creativity. One of the things that struck me from the readings is the idea of "Big red button" or "BRB" systems which automate out so much of the process at the risk of removing much meaning in particular works. I do agree that with these systems, alot of meaning that is derived from the process itself can be lost, and I don't think these are the sort of models that engineers should be looking to build. However, I do think that in many discussions about systems theses sorts of large language model-based systems like Stable Diffusion, DALLE 2, MusicLM, or ChatGPT, there is this assumption that they fall under the category of BRB systems. While some argue that LLMs act as "Big Red Button" AI systems that remove the human element from the creative process, I believe that they can actually be powerful tools for interactive machine learning and enhance creativity in new ways.
One common criticism of LLMs is that they generate output with little to no human intervention, making them "uncreative" and detrimental to the creative process. However, I think this assumption overlooks the fact that LLMs can be used to generate high-level content that allows users to have greater granularity over the creative output. For example, MusicLM, a machine learning model that generates music from text prompts, can be used to give users control over the overall mood or style of the music they want to create. This level of control allows users to create music that is tailored to their specific needs and preferences, while still benefiting from the creativity of the AI model.
In addition, I believe that any AI system that acts as an "oracle" and automates large parts of the creative process can be converted into an interactive machine learning system if it is allowed to iterate on past output. By allowing users to provide feedback and adjust the output generated by the AI system, the model can learn from previous interactions and become more attuned to the user's preferences over time. This iterative process can help bridge the gap between human and AI creativity, leading to more effective collaboration and more creative output.
Furthermore, I think it's important to acknowledge that LLMs create new creative processes on their own. The process of "prompt engineering" is an art in and of itself. By carefully selecting and refining the prompts used to generate output from LLMs, users can shape the creative output in new and interesting ways. This process requires human input and creativity, and therefore cannot be seen as a replacement for human creativity. Rather, LLMs can be seen as tools that enable new forms of creativity and collaboration between humans and machines.
In conclusion, I believe that interactive machine learning has the potential to enhance creativity and facilitate collaboration between humans and AI. While some may view LLMs as "Big Red Button" AI systems that remove the human element from the creative process, I think they can be used in ways that allow users to have greater control over the high-level content of the creative output. By allowing users to iterate on past output and engage in prompt engineering, LLMs can facilitate new forms of creativity and enhance the creative process in ways that were previously impossible. Ultimately, interactive machine learning is a powerful tool that has the potential to transform the way we approach creativity and collaboration in the future.
1. Digital art: It would be awesome if you could run something like DALLE 2 in photoshop such that you are able
to also get the photo layers and filters that generate an image instead of an image itself; that way, if there
is anything you don't like, you can either run your output as input again and fix what you want through natural
language, or you can edit it by hand if you'd like.
2. 3D Model Texturing: If you are trying to texture a model using software like blender, you should be able to use a system like DALLE 2 to generate the texture for you, and you should be able to interact with this texturing system in the same way that I outline in point 1.
3. Vocal synthesis: Imagine that you had some sort of text-to-speech system where can use someone's voice to sing a song according to text that you input, and that you have sliders that control how exhasperated, sad, angry, etc. they sound as they say the words that you typed.
4. Story writing: Imagine that you can write use a LLM to write a story by literally conversing with the LLM back and forth, with the LLM giving you really good writing advice on how your characters should behave and/or where your plot twists should go.
5. Education: Imagine that you are able to use an AI mentor that gives you questions on whatever subject you want to learn depending on which areas you need improvement; the interaction comes from the correctness of your responses to the questions and your confidence in how correct you are.
6. Interactive drum beats: What if you have an AI agent that responds to what you are playing on a guitar and continually drops a beat that meshes well with what you're playing?
7. Multimodal digital art making: It would be neat if there was software that played music fragments as you painted something digitally. For example, it would be cool to hear menacing music as you are laying down red brush strokes hard. Or soothing music if you lay down blue brush strokes lightly.
8. Imagine that you are playing air drums, and some sort kinect-like peripheral device records you and is able to generate drum sounds as if you were actually playing drums.
9. Music source separation: Labelling which parts of a song are drum beats and then using AI to extract said drums.
10. AI-powered gestural synth: You can use AI to recognize hand motions to control a synthesizer this way, like an AI thermin.