Reading Response 9

I liked how the Humans in the Loop article broke down the main issues with Big Red Button systems: they offer little user input throughout the process and therefore result in outputs that might have the right style but lack the right meaning. It made me think about DALL-E 2, which is a tool that allows you to generate images from a text description. The other day, as part of a class activity, I wanted DALL-E 2 to illustrate a fun social gathering where friends are sitting around and doing a blind taste-test using miracle berries. The prompt "friends miracle berry tasting with blindfolds digital art" led to the following images:

DALL-E pics

I think these pictures are super creepy. Why are the people so close together? Why are the backgrounds all yellow and green, as if something sinister is going on? I tried a bunch of different variations, and nothing communicated the meaning I was going for. When I added the word "happy", for example, the pictures remained green and looked even more sinister:

More DALL-E pics

I think Dall-E is a super powerful and interesting tool, but it would definitely be more useful if it allowed additional user input. Even the ability to switch out individual colors would make a big difference. Instead, DALL-E currently has an 82-page book to teach you how to create prompts that will generate the image you want. It's really designed with a "humans learn to use the Big Red Button" mindset instead of including the human in the interaction loop.

Thinking about DALL-E has given me an additional way to think about the ideas in the article. Designing with a human in the loop not only means allowing the person to have input, but also giving that person enough fine-grained control so that they can have a "final say" in whatever is produced. DALL-E 2 is frustrating because it is hard to control the details of the final image, and so the output feels like it comes from the AI, not the human. Since AI is not good at grasping meaning, it should be used as an intemediary tool for getting the human to where they want to go - but the human should be able to fine-tune any final output to get the meaning right.