Posts A story of AI and Plato's Cave
Post
Cancel

A story of AI and Plato's Cave

Some time ago while interacting with LLM, I had an idea. Remember Plato’s allegory of the cave?

I wanted to write a story that takes a spin on this idea. We are first presented with a man trapped in a room, with access only to a prompt of a LLM, and no memories of the outside. So he has to rediscover the world only through the descriptions and teachings from the LLM. How does the outside world look? How does nature work? What would he experience were he to be freed?

Transition

Transition from Plato’s cave to a man in an empty room prompting a LLM. Created using the Stable_Diffusion_KLMC2_Animation notebook by Katherine Crowson

This poses a few challenges for me. How do I decide between what the man remembers and what the man has no memory of? Clearly he must have knowledge of language, as to be able to use the prompt. He might not challenge the notion of a computer, so deeply ingrained in the experience of his whole life. Which words will he not know or understand what they refer to? Which will produce an eerie feeling of nostalgia or familiarity? The issue here is that I know what the model means, so I have to be careful to avoid biasing the narrative as much as possible. I have to put myself in his shoes, and try to figure out what terms or phenomenons would draw his attention the most. Other concepts might feel too grandiloquent to delve into at the moment, with his limited understanding of the universe.

As time goes by, the LLM’s responses might not only enrich, but also crumple his understanding. For instance, in the first part, the man will end up convinced that water and rocks are non-living organisms. Perhaps this seems harmless, but how do these errors compound over time? Such a flawed conception of the world is hard to keep track of, so I’ll have to figure out that as well.

I have a few ideas for future conversations, like asking a few questions to the model regarding the man’s feelings. Trying to put a name to what he is experiencing (pain, anxiety, confusion) based just off his description of it. Which is actually me verbalizing those and asking the model to identify them in the end, right? Eventually, besides only chat, I want to let the man use stable diffusion or similar text-to-image models to depict past descriptions of objects (as made by Assistant, the LLM). Also, some premises could be challenged: how does he know “outside” is even a thing? Could we convince the model to convince him that there is no real world? The man has no way to tell whether the model is right or wrong, so he can’t call out on his lies or misinformation. Could the man get Assistant to prompt itself?

I will limit each post to 10 Q&A, such that it doesn’t take too much of my time. Which I believe will encourage me to keep writing. If I let each post drag for too long, eventually I’ll be lazy about the whole thing and post less consistently. It also adds to the narrative, I think. For our main character needs to process the LLM’s infodump.

While my writing might not be the best, I hope this little experiment encourages me to have a better cadence on my blog and to help me explore how LLM’s perceive the world.

This post is licensed under MIT by the author.