Update on my AI predictions for 2024
It’s nine months since I posted my AI predictions for the year and six months since I reviewed progress and added four more predictions. Time for a check-in.
1.) We’ll see a major step change in the quality of AI-generated video
Whilst OpenAI kindly fulfilled this prediction within weeks of me making it by releasing demos of their video model, Sora, there’s been no shortage of activity since, as detailed in my September post, The AI video generation race.
Since that post, we’ve had ByteDance unveil two new video generation models (PixelDance and Seaweed), Haiper release Haiper 2.0, Pika drop its playful Pika 1.5 and Genmo release Moshi 1 on an open-source basis.
In my May predictions update I said “It feels like we’re overdue a video model update from Meta”. Last month, Meta took the covers off Movie Gen, which I anticipate will take AI video generation mainstream once it’s integrated with Meta’s pervasive product portfolio. It also looks like it will deliver on another prediction: “the ability to refine output at a granular level” (see below video).
I predicted a major milestone would be “video outpainting getting good enough to turn video shot in portrait into landscape (and vice versa)”. I don’t think we’re quite there yet. Adobe’s Generative Expand (released last year) is currently the best option, but it requires the edges of the shot to be static.
Generative Extend (first trailed by Adobe in April) is finally available as a beta in Premiere Pro. It enables video editors to extend existing video clips by up to 2 seconds and audio clips by up to 10 seconds. It’s a nice example of AI assisting human creatives. More examples like this will hopefully move us beyond the reductive, all or nothing, human vs machine framing of AI’s role in creative endeavours.
2.) Apple will seriously up its generative AI game, reimagining Siri and upgrading Apple Watch and AirPods to become the preeminent AI wearables
Check ✅, although AirPods have become hearing aids before they’ve become real-time translation devices (Meta Ray-Bans beat them to the punch on this).
3.) The New York Times and OpenAI / Microsoft will agree a data licensing deal and avoid going to trial
No deal as yet, but also no sign of a trial (the case is still in a protracted discovery phase).
4.) There will be lots more training data licensing deals
Oh yes. See News Corp/OpenAI, Vox Media/OpenAI, The Atlantic/OpenAI, Shutterstock/OpenAI, TIME/OpenAI, Conde Nast/OpenAI, Hearst/OpenAI, Lionsgate/Runway and Reuters/Meta.
We’ve also seen the launch of Perplexity’s Publisher’s Program, which aims to share ad revenue with publishers whenever their content is referenced alongside a revenue generating interaction. Initial partners include TIME, Fortune & Der Spiegel.
5.) LLMs will increasingly be used in combination with other AI tools
Yep.
Apart from thinking a bit slower, most of the visible development of LLMs in the last six months has been in combining them with other tools and techniques (e.g. Claude computer use - see update on prediction #14, below).
Google managed to finally get some serious attention on its LLM-powered research assistant, NotebookLM (launched last year), by using advanced speech synthesis to create Audio Overviews.
Meanwhile, OpenAI joined the ‘LLM + search engine’ race with ChatGPT Search and more robots are getting VLMs plumbed into them (see update on prediction #11 below).
6.) Advances in computer vision will unlock new consumer use cases
Whilst generating a video from nothing but a text-prompt is mind-blowing, using computer vision to analyse and reimagine live action performances is more immediately transformative.
Move AI continues to lead the way when it comes to full-body capture, with the launch of Move Live in June introducing real-time capabilities.
Runway’s release of Act-One last month marks a leap forward in using facial expressions to drive AI-generated video output (see below video).
ByteDance’s update to X-Portrait looks to have achieved even greater fidelity (although isn’t yet available to consumers).
More prosaically, Apple Intelligence will enable people to search their photos and videos using natural language.
Meanwhile, on the dystopian front, we had Harvard students using Meta Ray-Bans to scan strangers’ faces and collate information about them 😮
7.) Increasingly egregious viral deepfakes will prompt federal legislation
Increasingly egregious viral deepfakes? ‘Fraid so, with brothers grim, Trump & Musk, leading the charge.
Federal legislation? ‘Fraid not (and a federal judge blocked a Californian state law).
8.) Global elections will shine a brighter spotlight on some of the ways in which AI can be used for ill
See above.
Also bots.
9.) Use of generative AI in professional creative work will become more commonplace but stigma will remain
Yes, on both counts.
In late August, NaNoWriMo - a US non-profit - published an FAQ (since amended) that was interpreted as welcoming AI use in creative writing, which prompted an almighty backlash.
In September, ITV launched it’s ‘GenAI Ad Production for SMEs’, along with a couple of cheap and cheerful AI-generated animated ads for regional businesses.
ITV also briefly advertised for a ‘Head of Generative AI Innovation’ before pulling the role in response to industry criticism.
Last month, Universal Music Group released a Spanish version of ‘Rocking Around the Christmas Tree’ using a voice clone to reproduce Brenda Lee’s vocals. Even an endorsement from Lee herself wasn’t enough to placate critics, one of whom described it as “creeping AI tackiness”.
Whilst it made for an entertaining ad (below), Fiverr are mistaken if they genuinely believe that “nobody cares if you used AI”.
10.) More of us will be using AI as an assistant, delegate and/or companion
OpenAI says ChatGPT’s weekly users have grown to 200 million, Mark Zuckerberg claims Meta AI is approaching 500 million monthly active users (chiefly via WhatsApp). Microsoft & Google haven’t disclosed how many people are using their AI services, although the Copilotification / Geminification of their respective product portfolios shows no sign of slowing down. Meanwhile, Apple Intelligence is finally arriving in dribs and drabs, including an integration with ChatGPT.
To what extent we’re using these mainstream AI services as assistants is hard to gauge.
Most of the AI giants appear to be betting on agents to increase our levels of delegation. Microsoft recently announced “new agentic capabilities” and Project Jarvis is incoming from Google (see update on prediction #14, below).
Use of dedicated AI companionship products is easier to track. According to Semrush, Character AI now attracts over 20 million users a month with an average visit duration of over 14 minutes. Meanwhile, Chai appears to be growing like topsy.
11.) We will start to see more embodied AI
It’s certainly been busy 6 months for AI-powered humanoid robots.
In July, NEURA Robotics released a video of its VLM-enabled 4-NE1 (see what they did there?) doing household chores (which should please Joanna Maciejewska).
In early August Figure dropped a update to the VLM-powered robot that blew my mind in March, the creatively-named Figure 02.
A couple of weeks later, the 2024 World Robot Conference in Beijing featured a record 27 humanoid robots, including debuts from Stardust Intelligence’s Astribot S1 and the Zhejiang Humanoid Robot Innovation Center’s NAVAI Navigator 2.
The following week 1X Technologies unveiled their NEO Beta. However, its movement was so smooth in the launch video (below) they got accused of it being a guy in a suit.
Since then we’ve had Fourier Intelligence’s GR-2 and XPeng’s Iron.
Meanwhile, Boston Dynamics’ all-electric Atlas, unveiled in April, appears to be getting more independent.
Only Tesla let the side down by having the Optimus Gen 2 robots present at its ‘We Robot’ October event remote-controlled by human operators.
Whilst I believe Elon Musk is - as usual - way off the mark with his prediction of 10 billion humanoid robots by 2040, AI is clearly catalysing a space race land race that is seeing humanoid robots improve in leaps and bounds (time to re-read Ian McEwan’s Machines Like Me).
Beyond the humanoid form-factor, we’ve had the launch of the creepy Friend pendant (“you spent *how much* on a domain name?”) and Plaud’s ‘wearable AI memory capsule’ NotePin. I remain unconvinced that people will be persuaded to carry an additional device around with them and think existing devices (mobiles, watches, earphones and - in time - glasses) will be the primary vehicles for AI out of the home (along with our actual vehicles).
12.) We’ll see the first AI-generated immersive worlds/games
Lots of activity in this space.
In June, Stanford & MIT researchers showed off WonderWorld, a framework for generating 3D scenes in real-time from a single image.
In mid-August, Exists announced its eponymous text-to-game platform (currently in closed beta).
Later that month, Google researchers showed off GameNGen, “the first game engine powered entirely by a neural model”, capable of simulating the granddaddy of first-person shooters, DOOM.
The following week, Virtuals Protocol researchers unveiled MarioVGG, “a text-to-video diffusion model for controllable video generation on the Super Mario Bros game” and Roblox announced it’s planning to rollout a generative AI tool that lets creators create whole 3D scenes using just text prompts.
The week after, Fei-Fei Li’s World Labs emerged from stealth, promising ‘Large World Models’ that “perceive, generate and interact with the 3D world”, and Tencent (the world’s largest video game publisher) introduced GameGen-O, “the first AI diffusion transformer for open-world video game generation”.
Last month, Dreamworld Realities announced it was using Meshy to add text-to-3D asset generation to DreamWorld: The Infinite Sandbox MMO and Decart unveiled Oasis “the first playable, open-world AI model”.
13.) We’ll see machines processing inputs/learning more like humans
In June, MIT shared details of DenseAV which “learns to parse and understand the meaning of language just by watching videos of people talking”.
More recently, researchers at Vienna University of Technology have “created a robot that can learn tasks like cleaning a washbasin just by watching humans”.
And - slightly higher stakes - researchers at Johns Hopkins University have trained a robot to perform complex surgical procedures solely by showing it videos of seasoned surgeons.
14.) We’ll see more AIs navigating GUIs
Hello, Claude computer use, which was announced last month and promises to enable Anthropic’s AI assistant “to use computers the way people do - by looking at a screen, moving a cursor, clicking buttons and typing text”. The public beta is slow and imperfect but it’s the most tangible glimpse we’re had to date of what ‘AI agents’ might translate to in practice.
Google’s Project Jarvis, which is expected in December, sounds like it will introduce similar capabilities to Chrome, automating routine browser-based tasks like filling in web forms. Joyous news (except for maybe Zapier).
15.) Perplexity will scale to 100 million users, primarily through word of mouth
100 million queries a week? Yes.
100 million users? Not yet, although it’s currently seeing significant month-on-month growth (according to Semrush, monthly unique visitors are up by more than 40%, from 14.4 million in September to 20.4 million in October).
Whether Google and OpenAI’s own LLM-enabled search experiences can slow Perplexity’s growth remains to be seen.
Subscribe below to get my AI predictions for 2025 first and for free direct to your inbox.