Look Again: Why AI images don’t tell the whole picture

Force for Good or Evil? What you need to know about AI

We need different ways of seeing, and we must insist that our image-making tools reflect this diversity. A miniature painter understood that a Mughal garden could be experienced from multiple viewpoints simultaneously. A Japanese garden designer plays with the expression of negative space. An Indigenous artist sees the land from above and within at the same time, mapping song lines that are both geographical and spiritual.

Charu Maithani

Every day, generative AI is pumping millions of new pictures into existence. Researcher and UNSW lecturer Dr Charu Maithani is redefining how we think about the images flooding our world, exposing how they capture only the narrowest sliver of human experience. Her work reveals how reimagining the way AI is trained – what it sees, learns, and absorbs from culture – defines our visual vocabulary and reshapes our collective imagination. Sometimes, what we don’t see is just as important as what we do.

Transcript

Sue Keay: Good evening and welcome to tonight's talk by Charu Maithani: Look Again, Why AI Images Don't Tell the Whole Picture. My name is Sue Kay and I'm the Director of the UNSW AI Institute.

Before I invite Charu to speak tonight, I would like to acknowledge the Gadigal people of the Eora Nation, the traditional owners of the land and waters on which we gather tonight. I would also like to pay my respect to their elders, both past and present, and extend that respect to other Aboriginal and Torres Strait Islander people who are with us here today.

Tonight, we have the pleasure of hearing from Charu Maithani. As you may know, every day, generative AI, like ChatGPT, is pumping millions of new pictures into existence. Researcher and UNSW lecturer Charu Maithani is defining how we think about the images flooding our world, exposing how they capture only the narrow sliver of human experience. Her work reveals how reimagining the way AI is trained, what it sees, learns and absorbs from culture, defines our visual vocabulary and reshapes our collective imagination.

Sometimes what we don't see is just as important as what we do. Please join me in welcoming Charu for her talk.

Charu Maithani: Thank you. Thank you, Sue.

Imagine a garden where four rectangular grounds are walled within intersecting water channels.

Stone walkways lead you across the flowing streams of water into different quarters of the garden. The water flows, dancing into tanks called hauz, each studded with a fountain. The water threads through the garden, creating a symphony of sound and a cool refuge from the heat.

Lining the garden quarters are thoughtfully curated fruit trees, cypress and shade giving banyan trees. In the centre lies a baradari, an open pavilion adorned with intricate stoneworks, delicate inlays and decorative tiles offering spaces for contemplation and rest.

This is a Mughal garden, a popular garden designed throughout the northern Indian subcontinent.

Now let me take you through an exercise. Type a beautiful garden into any AI image generator. Soro, Dall-E, there are so many of them now.

I've done this many times over on different platforms in the last couple of years and 90% results for a beautiful garden are manicured English or French gardens. Colourful plants, geometric forms, a winding pathway made of stone or gravel. Not one showed a Japanese Zen garden or a Mughal garden like the one that I just described to you.

So image generators have learned what a default image of a garden is and apparently it's a Western style garden. Even generating images for a traditional architecture will give you Victorian houses, Gothic architecture and derivations of Greek columns. Never a Malian mud mosque, a Central Asian yurt or a Dravidian temple.

Several researchers have shown that AI systems aren't neutral. Their assumptions about the world matter more than you might think. We are not just generating images, we are watching these systems create and reinforce cultural norms through images.

Every default image teaches us something about whose version of reality gets to be normal. AI networks trained predominantly on Western datasets don't just replicate bias when they become the default creative tools. They make Western aesthetic traditions appear natural and universal while rendering other visual traditions exotic or invisible.

The current frenzy of generative AI image making is about production of visual knowledge. To understand the scale of what's happening, consider this. 15 billion images were generated using AI in just over a single year since Dall-E2's beta launch in July 2022.

That's approximately the same number of photographs we humans have generated or rather created for 150 years since cameras were invented. So 150 years of human photography matched by AI in just over a year. Every wedding photo, every political leader's portrait, every birthday celebration, every artistic experiment matched by AI.

But it's not just the numbers, it's not just the volume, it's also the quality of images that is flooding our feeds. The current diffusion models are used to produce what can only be termed AI slop. You might be familiar with them.

The images of kangaroo holding a boarding pass, fake Christmas markets at Buckingham Palace or the many variations of shrimp Jesus. Mass-produced aesthetic fast food.

It is estimated that AI-generated visuals make up over 50% of the market and social media campaigns.

Image-making has become simpler. Just enter a few words to get a text, just enter a few texts to get an image. But the tools and the apparatus to make images have become more complex to the point of being a black box.

We do not fully understand how the neural networks conjure these images. These images, these systems are also shielded from scrutiny, wrapped up in proprietary codes and hidden from regulation and accountability. One of the oldest creative acts of making images has been relegated to systems we can neither fully understand nor question.

And as we have seen, these systems have a very particular idea about what gardens should look like or whose architecture matters. So what do I mean by different ways of seeing how different cultures compose the images about the world? In Indian miniature paintings, gardens are viewed from a perspective that could seem impossible for Western visual system.

The elements in the miniature paintings are arranged for a viewer who is floating above the scene, looking down at the pavilions in the Mughal garden, while simultaneously getting a view from the sides, seeing the profile of the human figures in the gardens. Multiple viewpoints collapse into that one frame. In some cases, the same figure might appear in different parts of the garden, weaving the movements in one space.

There is no one focal point to start looking from, so the viewer gets different viewpoints and multiple perspectives. The primacy of linear perspective in Western visual style took hold during the Renaissance. Later, even with the inclusion of angular perspective and oblique views, the emphasis was on uniformity and accentuating the illusion of reality.

The garden is seen from the point of view of someone who is standing in one spot. They see parts recede into vanishing points. Trees diminish in size with distance, creating an illusion of depth.

Elements in the image are organised into background, middle ground, and foreground. The machine learning processes, part of generative AI, quantify seeing, which means that an image is reduced to formal qualities that can be mapped and computed into patterns that can be recognised. So when you ask any AI image generator for an image of a garden, a single point perspective, a worldview that is embedded in the training data consisting of photographs that follow these aesthetic principles, are presented.

We need different ways of seeing, and we must insist that our image-making tools reflect this diversity. A miniature painter understood that a Mughal garden could be experienced from multiple viewpoints simultaneously. A Japanese garden designer plays with the expression of negative space. An Indigenous artist sees the land from above and within at the same time, mapping song lines that are both geographical and spiritual.

If we allow AI to flatten all these ways of seeing into a single perspectival model, we are not just losing aesthetic diversity. We are losing different ways of being in the world, different relations to space and time, and different understandings of what it means to see and be seen.

We can imagine machines learning to see differently. What if we trained AI on multiple perspectives of Indian miniatures, teaching it that a garden can be experienced from above and within simultaneously? What if we introduced it to Indigenous ways of mapping, where landscape is geographical, historical, and narrative?

Another way to consider AI image generation is to expand beyond the purely visual. Seeing, in many cultures, is multisensory. The Mughal garden is understood through the sounds of the water flowing and the songbirds, through feeling the textures of the marbles and the sandstone.

Can we develop AI systems that understand images as synesthetic experiences? We must also question the relationship between text and image in the current AI diffusion models. It seems that the image-text pairing in the training and generation of images means that the visible is enslaved to the sayable.

If you cannot articulate it in words, it cannot exist as an image. Isn't that strange because images actually express what words cannot capture? And this brings me to my last point.

We must spend time in thinking deeply about whether machine vision should even try to capture what it cannot understand. The way the afternoon light filters through the jaali screens of a baradari, a pavilion in the Mughal garden. The geometry of a Mughal garden, which is at once an irrigation system, an engineering accomplishment, and a cosmological map of heaven.

When seeing requires saying, we lose access to the ineffable and the untranslatable. Some things perhaps should remain beyond the reach of computation and statistical prediction.

Thank you very much, everyone.

Speakers

Charu Maithani

Charu Maithani is currently a Lecturer in the School of Arts & Media, UNSW Sydney. As a researcher who organises her inquiries in the form of writing and curated projects, she is redefining how we think about AI and the images it floods our world with. Her work reveals how reimagining the way AI is trained—what it sees, learns, and absorbs from culture—defines our visual vocabulary and reshapes our collective imagination.

Look Again: Why AI images don’t tell the whole picture

Transcript

Charu Maithani

Related Content & Events

For first access to upcoming events and new ideas

Explore past events