Move over Midjourney?

There’s a new AI image generator in town and it’s faster and better at rendering text

Aug 08, 2024

AI-generated image of two supercars racing near the Golden Gate Bridge. The car nearest to the camera has the numberplate FLUX.1. Image generated using FLUX.1

AI image generators tend to be poor at rendering text. Ideogram is a notable exception to this, although it can’t consistently match Midjourney for image quality or coherence (see my earlier comparison of AI image generators).

Last week, Black Forest Labs emerged from stealth and released a new suite of models called FLUX.1 which look like they might just offer the best of both worlds/models, with high image quality and coherence and decent (but by no means perfect) text rendering.

They’re also much quicker than Midjourney (~5 seconds vs ~30 seconds in my tests).

FLUX.1 doesn’t yet have a nice friendly product interface, although you can have a play with the non-commercial models on fal.ai (the non-commercial models are open-source which means developers will be able to build on them).

Below is its response to the prompt ‘Photo of a man in a black hoodie holding a spray can next to an old stone wall with graffitied text which reads "This is not a photo". The paint is running slightly’.

AI-generated image of a man in a black hoodie holding a spray can next to old stone wall with graffitied text which reads "This is not a photo". Generated using FLUX.1

Not bad, although the eagle-eyed will notice a rogue fifth finger - a perennial issue for AI image generators (although Midjourney has got much better on this).

AI-generated image of six open hands held aloft. Image generated using Midjourney — Hands up if you’ve got better at generating images of hands with the correct number of digits (image generated using Midjourney)

Black Forest Labs haven’t disclosed what content FLUX.1 was trained on but from the images I’ve been able to generate (e.g. Super Mario and Darth Vader in combat) it’s safe to assume the training set included copyrighted material.

AI-generated image of Super Mario and Darth Vader in combat. Image generated using FLUX.1

A text-to-video model is in the works and the company has just secured $31 million seed funding. However, it’s not clear to me what their moat is. They’ve combined some innovative approaches but it seems inevitable other models will follow suit.

Still, it’s just got easier to create images like this:

AI-generated image of a tattoo on a man's back which reads "As a large language model I cannot help you decide on the best message for your tattoo". Image generated by Dan Taylor-Watt using FLUX.1

You’re welcome.

Dan’s Media & AI Sandwich

Discussion about this post