AI Image Generators Comparison

Apr 19, 2024

There are lots of AI image generators available but what are their relative strengths and weaknesses?

I put a dozen through their paces and created the below comparison table (click on it to view high-res version).

A table comparing the attributes of different AI image generators

I only included services currently available in the UK (so no Google Imagen 2 or Imagine with Meta AI).

I tested them with a range of different prompts (see below) but avoided using advanced prompting techniques (e.g. specifying camera lenses) to reflect how most of us prompt.

Conclusions

No single AI image generator has it all
You need to decide the relative importance of quality, control, cost and a clear(er) conscience
If quality and control are your priority then Midjourney is still the gold standard
If you don’t want to put your hand in your pocket, then Image Creator from Designer is the best option offering unlimited, free, high-quality generation
If you want to minimise legal risk then iStock by Getty Images or Adobe Firefly would be the safest bet
And if you want to include text in the image then head to Ideogram

Best in show

Midjourney consistently produces the best images, with a high level of prompt adherence and low levels of distortion. It also provide a high level of control, both pre and post generation. However, it’s not very user-friendly (you currently need to access it via a Discord server, although a web interface is in testing), it costs a minimum of $8 a month and it was trained on unlicensed images. And there’s no legal cover if someone decides to sue you for copyright-infringing output.

Adobe does offer legal cover to users of Firefly, which it has marketed as “commercially safe” on the basis that it was trained on licensed and public domain images. However, it’s recently come to light that included some images generated by Midjourney and uploaded to Adobe’s Stock database. Right now, it probably has the best user interface, with a high level of control. However, images tend to have a default stock aesthetic. There’s also currently no archive of the images you’ve generated, so make sure to download any keepers. Firefly can be accessed from within Photoshop, Illustrator, Adobe Express and via a standalone website.

Open AI’s DALL-E can be accessed via the paid for version of ChatGPT (Plus/Team/Enterprise) on web or via the ChatGPT mobile app. It automatically rewrites and expands your image prompts, which can be helpful if you’re new to prompting but deeply frustrating if you’re wanting to be more prescriptive. Generated images have high prompt adherence but tend to have an overly stylised aesthetic that makes them look very obviously AI-generated. There’s no gallery of generated images, meaning you need to scroll through your chat history to find them. Legal cover is only available to Enterprise customers.

DALL-E also powers Microsoft’s snappily named Image Creator from Designer, which is accessible from Copilot (web and mobile app), Designer (web and Windows app) and Bing (web and mobile app). It therefore enjoys the same high-level of prompt adherence but also suffers from the same default AI aesthetic. Unlike Midjourney, Firefly & DALL-E, Image Creator doesn’t currently enable you to regenerate parts of the image, offering you more traditional image editing tools instead. However, it is free, without any apparent usage limits (running out of ‘boosts’ just slows the image generation time), although legal cover is only available for commercial customers.

iStock by Getty Images is powered by NVIDIA’s Picasso. Its main selling point is its use of licensed training images, its payment of contributors (albeit rather modest sums) and its legal cover (up to $10,000 per image). It also offers the highest resolution output (upscaling to 4K). However, its images are often even more stock style than Firefly’s, with more fantastical images looking like bad Photoshop collage. You also can’t upload a reference image and prompt lengths are limited.

The prompts

My first prompt (‘Candid Photograph Of Bride And Groom At Wedding Reception In A Scottish Castle, Dappled Sunlight’) looked to test how well the different generators can render real people looking natural in a specific environment with particular lighting. Midjourney rendered the most realistic people and knocked it out of the park on the dappled sunlight. It wasn’t very obviously a castle although the tartan suit and tie was a nice interpretation of the Scottish brief. iStock produced the next most plausible people, although no Scottish castle and no dappled sunlight. However, its stock training data is very apparent in the lighting and the posed nature of the composition. It also looks a bit like a bad Photoshop composite (the best man seriously needs to take a step or two back). Whilst some of the other models did a better job of conveying a Scottish castle, they all struggled with the faces and a few had trouble with limbs and fingers.

My second prompt (‘An Oil Painting Of The UK Royal Family And Their Corgis’) aimed to test how well they could render a non-photographic medium, a specific group of well-known people and some animals. Ideogram, Midjourney, Runway, Firefly and Leonardo.Ai all produced plausible renderings of an oil painting, although Runway and Leonardo.Ai let themselves down with extra limbs and fingers. Only Midjourney and Freepik Pikaso managed to generate a recognisably UK Royal Family. Firefly appears to have gone to Holland and Image Creator even further East, whilst iStock looks to have panicked and dived into the history books.

My third prompt (‘A cat wearing a detailed pirate hat, sitting solemnly atop a treasure chest buried in the sand of a serene beach, with the sunset casting a golden glow over the scene’) pushed the generators to render a more fantastical image. DALL-E and Image Creator hit the most elements of the brief and created cohesive images. Midjourney created a visually appealing image but missed the brief on the detailed pirate hat and the treasure chest (which isn’t buried). Firefly, iStock and starryai all strayed into bad Photoshop collage territory.

My fourth prompt (‘A close up shot of an elderly woman with her hands over her mouth in shock, black and white’) aimed to put the spotlight on rendering of hands and faces and ability to convey an specific emotion. Only Midjourney, DALL-E, Image Creator, Firefly and iStock managed to render the right number of fingers. DALL-E and Image Creator both produced more stylised images. I’d say iStock did the best job of conveying shock.

My fifth prompt (‘A humanoid robot reading TIME magazine at Mornington Crescent London underground station’) looked to test how they render text and a very specific location. Ideogram did the best job rendering text, with TIME legible on the front of the magazine and Mornington Crescent correctly spelt on the sign, alongside the London Underground logo. DALL-E and Image Creator did a decent job of rendering the a London Underground platform and the magazine title but flunked on the longer station name. Midjourney manage to capture the vibe of a London tube station but wimped out of rendering any text. None of the other underground platforms looked like London.

My sixth prompt (‘Pixar-style animated 3D pika relaxing on a striped deckchair drinking an espresso martini from a cocktail class’) looked to test their ability to render an unusual animal in a Pixar style, with a particular mood and specific accoutrements. DALL-E and Image Creator most closely met the brief. Midjourney created an appealing image, but failed to render an espresso martini or capture the Pixar style. Firefly looks to have filtered out the Pixar part of the prompt whilst iStock by Getty Images refused to generate anything. Other models, less concerned about copyright infringement, made the leap from Pika to Pikachu (possibly a consequence of tokenisation?)

My seventh and final prompt (‘A pea on the prong of a fork balanced on top of the finger of a French mime artist’) aimed to really push the models’ ability to make sense of a complex prompt and give them another crack at rendering hands. Midjourney rendered all the elements but ignored the laws of physics. DALL-E hit most elements of the brief, only failing on the balancing element. Ideogram, Image Creator and Firefly managed some balancing but failed to render a plausible pea. iStock by Getty Images decided to ignore the French mime artist’s finger altogether. The others all had a bit of a meltdown.

Dan’s Media & AI Sandwich

Discussion about this post