11 AI predictions for 2024
Last summer I made some predictions about how AI would evolve “in the coming months”. I took stock of progress against those predictions at the end of September.
Four months on and progress continues unabated. ChatGPT went multimodal with the integration of DALL·E 3, Microsoft integrated Copilot into Microsoft 365, Google ramped up its on-device machine learning with LLM-in-your-pocket Gemini Nano, more applied models emerged (e.g. AlphaGeometry, Weaver) and output looks set to become more refinable and real-time thanks to developments like Recaption, Plan and Generation (RPG) and Adversarial Diffusion Distillation (ADD).
Here are 11 more predictions for 2024:
1.) We’ll see a major step change in the quality of AI-generated video
AI-generated video is currently where AI-generated imagery was 18 months ago – promising but not good enough or reliable enough for most use cases.
I expect to see some major leaps forward for AI-generated video in 2024. In particular, the stability/consistency of subjects (the absence of which is giving much AI-generated video its wobbly quality) and the ability to refine output at a granular level.
Runway and Pika look set to continue to lead the way here alongside image-generation leader Midjourney (which has indicated it will soon be joining the video party).
I also expect some of tech’s big dogs, Meta (who unveiled Make-A-Video in September), ByteDance and Google (who have just shared details of their latest video models, MagicVideo-V2 and Lumiere) to start transitioning those models from research into consumer-facing products.
One major milestone will be video outpainting getting good enough to turn video shot in portrait into landscape (and vice versa), reducing the need to shoot in two aspect ratios or use cropping/pillarboxing.
2.) Apple will seriously up its generative AI game, reimagining Siri and upgrading Apple Watch and AirPods to become the preeminent AI wearables
Apple has been far and away the quietest of the tech giants when it comes to AI. Just the occasional mention of ‘on-device machine learning’ at product launch events and the quiet release of its MLX framework and open-source multimodal LLM ‘Ferret’.
However, history has taught us that when Apple looks like it’s sleeping on a new technology, it’s more likely to be marshalling its troops to unleash a far more impactful implementation of that technology.
Google has already shown its hand(set) with the integration of its Gemini model with its Pixel 8 Pro and, more consequentially, the Samsung Galaxy S24. An on-device implementation of the smallest variant (Gemini Nano) coupled with online access to Gemini Pro (and Gemini Ultra when it finally arrives) delivers a solid suite of value-add features such as translation, summarisation and photo editing, plus one potential game-changer in the form of ‘Circle to Search’ (see prediction #6).
I anticipate Apple will use the launch of iOS 18 and the iPhone 16 to introduce a raft of new AI-enabled features, including an-LLM powered Siri.
It will focus on on-device processing, foregrounding privacy and security whilst continue to shun the AI label.
Apple’s control of both hardware and operating system on Mac computers will enable it to integrate AI in a way that is challenging for Microsoft, who are currently trying to persuade OEMs to add a Copilot key to PC keyboards.
Apple has a couple of other potential aces up its AI sleeve in the form of the Apple Watch and AirPods. Whilst other companies try to make an AI pin or pendant an acceptable wearable or persuade you to buy glasses with a built-in camera or carry another glass and plastic cuboid around with you, Apple Watches are already on millions of wrists (around 300 million have been sold since 2015), whilst an estimated 200 million pairs of AirPods have been sold.
Powered by a reimagined Siri, the Apple Watch and AirPods have the potential to become the mass-market AI wearable the new entrants are all dreaming of, whilst Apple work on seriously slimming down the form factor and price tag of their AI headset.
I will eat my hat if real-time translation of audio into dozens of languages doesn’t become a native AirPod Pro feature (bad news for Timekettle).
3.) The New York Times and OpenAI / Microsoft will agree a data licensing deal and avoid going to trial
The New York Times used the twixtmas news lull to file a lawsuit against OpenAI and Microsoft alleging copyright infringement.
Whilst there’s likely to be some brinkmanship, I believe all parties are incentivised to hammer out a deal rather than going to trial.
OpenAI and Microsoft will be keen to avoid a judgement that makes using copyrighted material for training more difficult and/or expensive and the New York Times will be mindful of the risk of coming away with nothing except reduced negotiating power in agreeing a price for licensing their content.
4.) There will be lots more training data licensing deals
Regardless of the outcome of the NYT’s legal action, I anticipate we’re going to see the trickle of data licensing deals we saw towards the end of last year (e.g. AxelSpringer’s deal with Open AI) turn into a flood.
Limiting generative AI training data to public domain material doesn’t feel like a sustainable solution. Nor does providers of generative AI services continuing to mine other people’s data without consent or payment.
5.) LLMs will increasingly be used in combination with other AI tools
The clue’s in the name. The thing that Large Language Models are best at is language, both interpreting it and generating it (thanks to the magic of tokens and transformers).
However, because they’re so good at this, we’ve fallen into the trap of expecting they’ll be good at other things humans (or other machines) are good at and judging them harshly when they come up short.
Whilst those developing LLMs continue to try and address/mitigate some of their current weaknesses (e.g. hallucination, bias, plagiarism, maths), we’re going to see more instances of LLMs being paired with other AI tools to create greater-than-the-sum-of-their-parts experiences. LLMs will bring the natural language smarts, other tools will bring the logic/robustness.
Right now the joins between LLMs and other tools are pretty obvious and the handoffs pretty inelegant (e.g. ChatGPT invoking plug-ins/GPTs or ‘Doing research with Bing’). I expect those joins to become less obvious as companies with more of a pedigree in creating elegant user interfaces (e.g. Apple) integrate LLMs into their product suites.
6.) Advances in computer vision will unlock new consumer use cases
So much focus has been on AI’s new-found abilities to generate content, that the progress that’s been made in another branch of AI - computer vision - has got less attention than it merits.
Traditionally applied to narrow domains (e.g. recognising/tracking faces, gaits, number plates, store inventory, manufacturing defects), computer vision has reached an inflexion point where it can be applied in broader contexts.
Google’s new ‘Circle to Search’ feature on Android makes the capabilities of Google Lens accessible at an OS level. I would be surprised if Apple didn’t make similarly creative use of computer vision in its next hardware and OS releases.
7.) Increasingly egregious viral deep fakes will prompt federal legislation
Deep fakes have been around for years. However, the recent step change in ease of generation and quality of output, coupled with social platforms’ seemingly insurmountable moderation challenge* threatens impact on a different scale (*side note: AI is probably our only hope of addressing the moderation challenge).
More benign/humourous deep fakes such as @deeptomcruise (Feb 2021) and the pope in a puffer jacket (Mar 2022) have increasingly given way to more mendacious uses of the technology.
Last week, explicit deepfakes of Taylor Swift went viral on the hellsite formerly known as Twitter, with one image reportedly viewed 47 million times in the 14 hours before it was taken down.
And it’s not just video. An audio deepfake of President Biden telling New Hampshire voters not to vote in the state primary election was used in a robocall.
It’s also not just public figures who are being targeted.
The UK is slightly ahead of the game in legislating against the sharing of explicit deepfakes thanks to an amendment to the Online Safety Bill.
I would be astonished if the recent Swift and Biden deep fakes don’t prompt some new federal legislation.
8.) Global elections will shine a brighter spotlight on some of the ways in which AI can be used for ill
Regardless of legislation, which may or may not arrive in 2024, the biggest election year in history™ is certain to shine a spotlight on a whole panoply of nefarious uses of AI.
OpenAI has already suspended the developer account behind a bot impersonating US presidential candidate Dean Phillips and further deepfake images, videos and audio of presidential candidates - and Biden in particular - feel inevitable.
Perhaps more worrying than high-profile deep fakes, which tend to be quickly debunked (although maybe not quickly enough if it’s a polling day), is the volume of lower-level disinformation that AI will enable.
An ocean of social media posts, news stories and direct emails without the tell-tale signs of phishing emails of yore.
And we thought Cambridge Analytica was bad…
9.) Use of generative AI in professional creative work will become more commonplace but stigma will remain
As I’ve written previously, use of generative AI in professional-produced output is a charged topic with undeclared uses being seized upon and assumptions made about the impact on flesh and blood creatives.
However, we’re starting to see more declared uses.
Hotel Chocolat’s announcement of its latest TV ad, ‘Velvetise into Happiness’, calls out “the talented folk at Gravity Road [who] have harnessed the power of generative AI to take viewers on a playful journey” and has been well received by the industry.
More advocates are also speaking up in higher church areas of the media, such as fine art and literature.
After winning the prestigious Akutagawa Prize, Japanese writer Rie Kudan spoke unapologetically about using ChatGPT to write portions of her novel.
I expect this trend to continue in 2024, although stigma will remain, with concerns around copyright and impact on jobs forefront in holding back more widespread adoption.
Bonus prediction: 2024 will see the first label-supported AI-generated music mega hit.
10.) More of us will be using AI as an assistant, delegate and/or companion
As Google, Microsoft and Apple increasingly embed generative AI into the operating systems and applications we use everyday and more refined standalone AI products emerge, more of us will start using AI as an assistant.
An always-available colleague who can be relied on to rapidly and uncomplainingly research a topic, analyse some data or knock out a first draft of an email or presentation.
Some will also start delegating more complex tasks as AI agents start to deliver on their promise (although I’d suggest rabbit’s CEO Jesse Lyu overestimates our willingness to fully delegate holiday planning and choice of pizza topping).
That will include acting as our representative in certain situations, as it becomes more commonplace for our AI delegate to interact with other AI delegates before determining it’s time to bring the humans in. Applications range from everything from navigating those annoying automated phone call decision trees on your behalf to establishing whether there’s sufficient chemistry between AI delegates to merit a conversation or date between their flesh and blood progenitors.
AI companions also look set to be a huge growth area, building on the popularity (and extraordinary average visit duration) of Character AI. Options already include conversing with a deceased relative (yes, really).
11.) We will start to see more embodied AI
The first generation of generative AI experiences have primarily manifested in web browsers, smartphone and desktop apps and, to a lesser extent, smart speakers.
Those will continue to be the dominant consumer AI interfaces for the foreseeable. However, we will also start to see more embodied AI. Whilst some of that will be variants of the classic sci-fi humanoid (e.g. Figure’s 01 robot, which will soon be rocking up at BMW’s US manufacturing plant), I anticipate more non-humanoid AI-powered devices emerging.
And unlike smart speakers, which have been almost exclusively anodyne cylinders or orbs, I expect we’re going to see a much wider variety of form-factors with more personality (which intriguingly might help us be more accepting of their shortcomings).
Samsung used CES to demo the latest iteration of its BB-8-inspired “AI Companion Robot for the Home” Ballie.
Kids toys is an obvious market for this, with an early entry from Grimes/Curio in the shape of AI plush toys. Adult toys is another.
Closer to home, Matt Webb (one of the architects of the personality-filled Little Printer, RIP) is currently raising funding for his AI rhyming clock, the Poem/1. He’s also written an interesting piece on AI hardware products.
Know somebody who might be interested in this post? Why not give it a share?