The 5 key ingredients of effective AI image prompting
Last week I did a comparison of leading AI image generators, concluding that Google’s Nano Banana Pro is the best option for most people, most of the time.
But choosing a capable model doesn’t guarantee good results.
This week, I’m sharing the five ingredients I’ve found matter most for effective AI image prompting.
1.) Clarity of purpose
Key to creating an effective image, whether drawn, painted, photographed, designed or AI-generated, is clarity of purpose. The clearer the artist/photographer/designer/prompter is on what they want the viewer to take away from an image, the more likely they are to create an effective image.
Is it intended to be a background image, subtly reinforcing a mood/message? Is it meant to be obviously AI-generated? Would using a non-AI-generated image be practical/better?
2.) Vision
Being clear on the purpose of an image is necessary but not sufficient when it comes to prompting an AI image generator, which tend to fare better with visual rather than conceptual instructions.
Some people will arrive at the prompting box with a clear picture in their mind of the image they’re after - the challenge for them is translating that mental image into words. For others, especially those with aphantasia, there won’t be a mental image to describe and it will be straight to words.
I used to direct people to Google’s Say What You See! experiment to learn how to talk to AI image generators.
Mercifully, newer AI image generators powered by multimodal models (such as Nano Banana Pro and ChatGPT Images) are able to understand natural language, removing the need for arcane syntax.
3.) Direction
However, it can be hard to think what to include in your prompt when staring at a blank text box. Here’s my checklist, not all of which will be relevant for all prompts:
Medium / style (e.g. photo, illustration, painting)
Subject - who or what is the focus? (age, gender, ethnicity, distinguishing features, demeanour)
Action - what’s happening? (e.g. the dam has just burst)
Setting (e.g. in a North London mews)
Vibe / tone (e.g. candid, formal, dystopian)
Perspective / framing / composition (e.g. first-person, close-up)
Lighting / Weather (e.g. studio, golden hour, heavy rain)
Aspect ratio (e.g. 16:9, 1:1, 9:16)
Text - put in quotation marks (e.g. sign says “Dan’s Diner”)
Constraints (e.g. no text, no logos)
4.) Vocabulary
A rich vocabulary can help. Asking for “a Georgian townhouse with peeling sage-green paint, wrought iron balcony with rust blooms, cracked limestone steps” will likely generate a richer and more distinctive image than “an old building”.

Many generators now offer the option of having AI enhance your text prompt. However, this brings with it a loss of control, as the AI may elaborate your prompt in quite a different direction to the one you had in mind.
The more vivid and precise you can be in your prompts, the greater control you can exercise over the outputs.
There may be times when you don’t have a clear idea in your mind. Fortunately, newer models’ ability to elaborate a simple prompt in different directions (Ideogram’s particularly good at this) can result in impactful images you would never have thought to describe.
5.) Discernment
As important as what goes in (the prompts), is the assessment of what comes out. Does the image deliver on its purpose? Is further refinement required?
I’ve hand-drawn a lot of images in my life and the hardest part of the process is knowing when to put the pen/pencil down. A similar judgement call is required in AI image generation.
Equally, knowing when a particular route has not delivered as you’d hoped and the resulting output should not be inflicted on a wider audience.
‘AI slop’ is increasingly used interchangeably with ‘AI-generated’ but I think we should try to preserve its original meaning which Wikipedia distils as “lacking in effort, quality, or meaning, and produced in high volume as clickbait to gain advantage in the attention economy”.
It’s not slop because it’s AI-generated, but because of a (usually deliberate) lack of discernment and regard for the viewer.
As with hand-rendered imagery, returning to AI-generated images the following day can help you look at them with a fresh eye to determine whether they’re up to snuff (and potentially spot the extra limb you missed yesterday).
Ultimately, the effectiveness of an AI-generated image comes back to the same question: what will viewers take away from it? If the execution undermines, rather than reinforces, the intended message then resist the temptation to use it. The tools may be new - the judgement isn’t.



