Whilst OpenAI continues to keep Sora’s availability limited to a select group of creators and Adobe’s Firefly video model remains ‘coming soon’, AI video generators from other companies are now coming thick and fast.
And they’re mostly coming from two places - the US West Coast and China.
In January, Beijing-based Aishi (founded by a former ByteDance exec) launched PixVerse video generator.
In March, London-based startup Haiper debuted its text-to-video model.
In April, Chinese startup ShengShu Technology unveiled Vidu.
In May, San Francisco-based Krea launched its video generator.
In June, Kuaishou opened KLING to Chinese users.
A few days later, Palo Alto-based Luma took the covers off Dream Machine.
In late July, Chinese startup Zhipu AI launched Ying.
In August, TikTok’s parent company ByteDance debuted Jimeng AI.
A few weeks later, a 4-person San Francisco-based team unveiled Hotshot.
And last week, Alibaba-backed startup MiniMax released Video-01.
I decided to give each of these AI video generators (along with the class of 2023: Runway, Genmo Replay, Pika & Stable Video) the same deliberately open-ended prompt: ‘great leader at public gathering, golden hour, arc shot‘.
Below is a compilation of the resulting videos from 12 of the 14 (Jimeng AI wouldn’t accept my mobile number and Krea won’t generate video from text without keyframes).
Technical execution
All models managed to conjure some degree of ‘golden hour’ lighting.
Only half the models (Runway, Luma, Stable Video, Vidu, Ying & MiniMax) rendered an arc shot (where the camera moves around its subject).
Most models had some issues with hands and/or faces.
Most models struggled to render a convincing crowd (Hotshot, KLING, Vidu & MiniMax faired best)
Overall, Hotshot produced the most convincing video (although didn’t deliver an arc shot). MiniMax produced an arc shot with few visual anomalies, although avoided having to render the leader’s face by having him face away from the viewer.
Interpretation of ‘great leader’
Runway was the only model to render a female leader, who’s wearing a long black skirt suit over a white blouse and standing on an elevated podium. In the distance behind her flutters the Stars and Stripes.
Pika’s leader, who’s framed in close-up, has strong Joe Biden vibes, with white hair, an open collar shirt and a black suit jacket (he also appears to be regurgitating a Polo mint).
Luma’s leader has more of a Martin Luther King vibe with short black hair, a black suit jacket over a white shirt and dark trousers.
Genmo appears to have rendered an ancient Roman leader, complete with armour-inspired gold jacket (likely as a result of the ‘golden’ in ‘golden hour’).
Hotshot, Stable Video & Haiper’s leaders are all besuited men with white shirts, white hair and beards.
PixVerse has just given us Trump 🤦♂️ (possibly because ‘great’ is one of his go-to words, especially when referring to himself).
KLING’s great leader looks more generic Western businessman.
Vidu’s leader is black, in a short-sleeve shirt, surrounded by an appreciative crowd.
Ying’s leader is only viewed from behind and most resembles a poorly animated video game character.
MiniMax’s leader is a white-haired man in a long black robe on a low, circular stage surrounded by an attentive crowd.
The probabilistic nature of generative AI means these models will all generate different videos when given the same prompt again. That said, the output from each model was pretty similar when I repeated the exercise (with the exception of Genmo which was very different but equally bonkers).
Whilst there are a few common elements (today’s AI video models appear to consider white shirts and black jackets to be the uniform of great leaders), the diversity of these responses to the same prompt is indicative of material differences in the models’ training data, fine-tuning and/or prompt interpretation/enhancement. Differences which are resulting in output biases which are particularly evident when responding to open-ended - and ultimately subjective - terms such as ‘great leader’.
In the future, picking a video generator may not just be a question of weighing up quality, control and price, but also checking the ingredients (training data) to get a sense of likely bias. If video generators get classified as General-purpose AI models and have to comply with the EU AI Act’s transparency requirement to publish “detailed summaries of the content used for training”, we should get more of an insight into what’s going into the models to help inform our assessment of their output.