3 month ago I made some predictions about how AI would evolve “in the coming months”. I added a couple more predictions in this July post.
This post takes stock of those predictions and looks at the progress made against each of them.
1.) Ubiquitous / Integrated
My first prediction was a fairly safe bet: that AI would become increasingly ubiquitous and that whilst standalone AI services would continue to proliferate, we would see generative AI increasingly added to existing applications and services.
Standalone services have certainly continued to proliferate. There’s An AI For That’s catalogue now boasts “8,256 AIs for 2,220 tasks”😳
And last week saw a rush of announcements on AI being added to existing scale applications and services.
On Thursday Microsoft announced it was integrating its “everyday AI companion”, Copilot, into Windows 11 and adding generative AI functionality to Photos and Paint.
On the same day Google announced a raft of new AI tools within YouTube.
And yesterday Meta announced its “advanced conversational assistant” (chatbot) will be rolled out across WhatsApp, Messenger and Instagram, along with some persona-based chatbots, AI stickers and AI editing tools.
2.) Accessible
My second prediction was that AI model interfaces would be made more accessible and intuitive for mainstream audiences.
Last week OpenAI announced that the latest version of its image generation service, DALL-E 3, will be accessible from within ChatGPT.
As well as putting image generation within easy reach of ChatGPT’s 100m+ users, having a LLM on hand to help generate descriptive image prompts promises to be a game-changer in enabling non-expert users to get better outputs.
The promo video brings this to life rather nicely:
3.) Applied
My third prediction was an increase in more focused models, trained on more specialist data sets.
Visible progress has been slower here, although there’s lot of work in this area and the application of Retrieval Augmented Generation (RAG) is looking promising as a way of combining the language smarts of an LLM with the reliability of a managed dataset.
Meanwhile, Getty’s partnership with Nvidia, announced back in March, has now borne fruit in the shape of ‘Generative AI by Getty Images’ - an image generation model trained exclusively on Getty licensed images.
4.) Personal
My fourth prediction was that models would become more personalised.
Google took a step in that direction last week with the launch of Extensions, which enable Bard to integrate with your Gmail, Calendar, Docs and Maps data.
On the visual front, HeyGen’s Instant Avatar Beta represents a fairly significant milestone in the democratisation of digital cloning, enabling selected beta users to create a digital video likeness for free. Here’s mine.
The release of Apple’s Personal Voice and Live Speech promised to so something similar for voice cloning but I’ve been pretty underwhelmed by the results (is it my strange UK accent that’s the problem or are Apple deliberately dialling down the realism to try and minimise abuse?)
5.) Refinable
My fifth prediction was that AI output would become easier to refine.
The aforementioned integration of DALL-E 3 with ChatGPT feels like the biggest step forward here, promising to improve the refine-ability of image generation for the masses. OpenAI’s promotional blurb states “If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.”
DALLE 3 is being made available to ChatGPT Plus and Enterprise customers in early October so we shall soon see.
6.) Real-time
My sixth prediction was that AI models and the training data powering them would become more real-time, both in terms of response times and in terms of up-to-date training data.
No-one’s announced any major breakthroughs in response times in the last few months, leaving us stuck with the guidance to write shorter prompts.
Some modest gains on the training data front as GPT-4, which powers ChatGPT Plus and Bing Chat, now has an extra few months of training data, up until January 2022 (vs September 2021 for regular ChatGPT). ChatGPT Plus also, as of Wednesday, has access to the web via its ‘Browse with Bing’ feature.
7.) Vocal
My seventh prediction was generative AI making voice assistants more useful.
While the proof will be in the pudding, Amazon last week announced “a smarter and more conversational Alexa, powered by generative AI”.
And this week, OpenAI announced it is adding voice responses to ChatGPT, enabling users to have a back-and-forth exchange without having to read or type.
Whilst Apple’s updates to Siri in iOS17 are fairly minor, it feels like only a matter of time before she gets a more major AI-overhaul.
8.) Transparent
My eighth and final prediction was greater transparency, with more generative AI tools which provide sources/references, even if they have to be bolted on after the fact.
Last week Google added an option to Bard to read the chatbot’s responses and evaluate whether there’s online content to substantiate or refute its statements. Supported statements are highlighted in green, unsubstantiated statements are highlighted in orange, with links to sources.
All in all, pretty extraordinary progress in just 13 weeks. It would appear AI doesn’t take a summer holiday.