When AI image generators can research, reason and render text
How Google's latest image model unlocks new practical applications
Three months ago I wrote a post titled AI image editing just levelled up, following the release of Google’s Nano Banana image generation/editing model.
In addition to reliably generating decent looking images with high prompt adherence and plausibly combined reference images, its superpower was making it easy to modify elements of an image whilst leaving the rest of the image unchanged using simple natural language instructions (“give the chicken a hat and add some fireworks in the background”).
Image edits that would previously have required some serious Photoshop skills (or some advanced AI tools and prompting skills) and a chunk of time could be rapidly executed to a passable quality by anyone via the Gemini website or app.
Now Google has released Nano Banana Pro, built on top of its impressive new flagship multimodal AI model, Gemini 3 Pro. Nano Banana Pro not only ups the level of prompt adherence and the number of reference images it will reflect in a single generation (to 14!). It also introduces a couple of new capabilities which represent a material step-change in the utility of AI image generation.
A picture that includes a thousand words
Accurately rendering text has been a longstanding challenge for AI image generators. Early image generation models would often produce mangled text in otherwise usable images (less than half the models in my December 2024 test could accurately render my name and job title).
Ideogram zoned in on this challenge and, for a period, made its better-than-the-rest text generation something of a USP (whilst market-leader Midjourney was more focused on the artistry of its generations). However, Ideogram still struggles to reliably render large amounts of text and sometimes fluffs short sentences.
A couple of weeks on from launch and it’s clear Nano Banana Pro can render whole pages of text with a fairly high level of accuracy and with a high degree of stylisation.
This opens up a whole new swathe of text-centric use cases where previously AI image generators weren’t quite up to the job: menus, posters, magazine covers, comic strips, packaging, marketing collateral, classroom worksheets.
One of the reasons Nano Banana Pro is so good at rendering text is a consequence of its second new superpower: the ability to research and reason.
An AI-image generator that can research and reason
I’ve written before about reasoning and research modes. When invoked by an AI image generator, they enable the model to explore different approaches to realising an image and take account of fresh information from the web.
Here are three examples of the sort of thing these capabilities enable:
1.) Mock-up on-brand billboard ad with up-to-date info
I asked Nano Banana Pro to create a Sky Sports billboard ad for the next Ashes Test, including the date and UK time and a headline which takes account of the result of the 1st test, without providing it with those details.
The perspective and alignment on the copy is a little off. Otherwise, I’d say it’s a pretty decent attempt, which hits all the elements of my brief.
Perhaps even more interesting than the final image is the summary you get of the model’s ‘thinking’ process when you click on ‘Show thinking':
It’s researched the result of the previous test, the details of the upcoming test and based the creative on that. Creative which feels pretty on-brand for Sky Sports.
My use of the word ‘billboard’ prompted it to show the ad in situ (I appreciated the clearly UK setting). Asking for just the design as an asset tidied up the text, although unexpectedly changed the background on the right-hand side (below).
2. Assessing and annotating the claims in a printed document
We got a leaflet through our door the other day from an unnamed individual/group looking to rally opposition to a proposed 5G mast.
I took a photo of it and asked Gemini to stick it to a whiteboard and “annotate with a scientifically-grounded assessment of the key claims”.
On display here is Nano Banana Pro’s ability to convincingly recontextualise a reference image, to read and research the information within an image and to present back its assessment as appropriately-positioned, legible text, in a distinctive style.
Of course, the usual disclaimer about generative AI’s propensity to make things up and reflect training data bias applies (although I trust Gemini 3 Pro a great deal more than the author of this leaflet), and I would want to carry out some independent research before taping Nano Banana Pro’s takedown to any neighbourhood lampposts.
3. Infographics based on recent data
I asked Nano Banana Pro to create me an infographic summarising what the 2025 UK Budget means for different illustrative families.
It’s not the most visually appealing of infographics, but it does illustrate Nano Banana Pro’s ability to distil key information and lay it out clearly.
I didn’t spot any obvious factual errors or spelling mistakes, although I feared some biases absorbed from training data may have informed the design.
I asked Gemini whether it was potentially reflecting bias by only flagging the sugary drinks tax increase in the low-income family example. Here’s its response:
I also asked it whether it thought there was any unconscious bias at play in the ethnicity of its illustrative families. Here’s its response:
Less than two weeks on from launch and it’s clear infographics are going to be a common use case for Nano Banana Pro.
Google has already begun rolling Nano Banana Pro out to NotebookLM, Vids and Google Slides, including the tempting option to ‘Beautify this slide’, with an impressive-if-representative example (below).
Top banana?
Google isn’t the only AI company developing reasoning-enable image generation models (BlackForestLabs released the FLUX.2 family of models last week, which it’s positioning as ‘frontier visual intelligence’). However, Nano Banana Pro is currently the most advanced and the most accessible (anyone can try it out on the Gemini website by selecting ‘Thinking with 3 Pro’ from the model dropdown and selecting ‘Create Images’ from the tools dropdown).
Whilst we’re overdue an updated image model from OpenAI, I suspect it will be catching up with, rather than overtaking, Google’s model, which is now leveraging its strength in other verticals (e.g. web search) to open up a bit of a lead in AI image and video generation.
Implications
Accurately rendering text and applying research and reasoning to AI image generation significantly increases its potential applications.
It doesn’t mean Sky should let their creative agency go. Rather, it offers a new way of rapidly visualising different creative executions.
It doesn’t mean infographic designers are no longer required, but it does lower the barrier to presenting information in this format, which will inevitably lead to more infographics of varying quality.
It’s certainly going to necessitate more scrupulous proofreading and fact-checking. I suspect there’s a high risk the visual polish of Nano Banana Pro’s outputs results in mistakes being missed that might have been caught in plain text.
What it doesn’t do is remove the need for human decision making and accountability. The brief (can we start saying ‘brief’ instead of ‘prompt’?) remains critical, as do decisions about what needs refining (and how), where AI enhances and where it detracts and where a human touch is essential.
Dan’s Media & AI Sandwich is free to read. If you’ve found this post interesting, please consider liking and/or sharing it. If you’d like to chip in to support my writing, you can do so by becoming a paid subscriber. Contributions make it easier for me to dedicate more time to writing. Thanks to those of you who’ve already become paid subscribers. If you’d prefer to make a one-off contribution you can do so at buymeacoffee.com/dantaylorwatt











Really impressive - both in what can be done, showing the fast rate of progress, but also in the accessible and entertaining way you do it. Thanks!