Discover more from Dan’s Media & AI Sandwich
Generative AI trends
Generative AI is currently developing at a head-spinning pace, with dozens of innovative new tools launching each week.
But taking a step back, what are some of the emerging macro trends?
Here are 8 things that generative AI will become ‘more’ in the coming months:
Generative AI has spawned a lot of new websites and apps in the last 9 months.
We will continue to see standalone services proliferate, but we’ll also see generative AI increasingly added to existing applications and services. Some of those additions will be visible, others will be under the hood (and in Apple’s case, only ever referred to as ‘on-device machine learning’).
Google and Microsoft are already racing to conspicuously add AI elements throughout their product suites, as much for the benefit of shareholders as consumers.
Not all of these integrations will stick of course, but we are progressing rapidly on a path towards ‘AI everywhere’, where the use of AI will be a given rather than something novel/enticing to foreground in your comms and marketing (like having .com as part of your brandname in the early noughties).
Whilst many of the current crop of LLMs and image-generation services are superficially accessible, getting the best out of them demands a new skill: prompt engineering.
After years learning the paired-down syntax of effective search queries, we’re now having to learn to add the human phrasing and context back in.
Many of us are also resorting to using one generative AI model to create the optimal input for another (e.g. using ChatGPT’s Photorealistic plugin to create Midjourney prompts).
Making AI model interfaces more accessible and intuitive for mainstream audiences will be key to the next wave of adoption (see also Refinable - below).
One of the most impressive things about the current crop of LLMs is their ability to generate plausible responses across so many different domains.
Whilst large, generalist AI models will continue to fulfil an important role, we’re likely to see an increase in more focused models, trained on more specialist data sets (see Reka, which has just emerged from stealth mode).
As well as reducing contamination from less relevant/reliable training data, models using more focused data sets can be significantly less expensive to train.
And one of those more focused data sets will be you 😮
In addition to being able to use AI models to create digital clones of our voice (see Play HT, Elevenlabs, Resemble) and physical likeness (see Synthesia, HeyGen), we will increasingly have the option of accessing more personalised AI models, where the training set is our personal data and documents.
Reword claims to have developed an AI model that can write in your style (by training it on articles you’ve written), whilst others (e.g. Mindbank Ai) are pursuing the goal of a digital twin, with the added benefit of perfect recall of every person you’ve met, every image you’ve seen and every word you’ve read, heard or spoken.
Apple’s decision to market itself as the privacy-first company you can trust with your data puts it in pole position to be the custodian of your future personal AI (sorry, personal on-device machine learning).
A significant frustration with generative AI, especially when it comes to image and video generation, is the difficulty in refining the output. Regenerating the output will invariably lose some of the qualities you wanted to retain and introduce other unwanted elements.
This is starting to change, with tools which afford users greater up-front control over the output (e.g. ControlNet), greater ability to iterate and less need for prompt engineering (hallelujah).
Being able to use natural language to easily refine individual elements (e.g. make the jacket red and remove the extra finger) without introducing unwanted variance elsewhere is now tantalising close.
Whilst current AI models are impressively quick, we’ve been spoiled by near-instantaneous Google results and Netflix stream starts for too long to accept some of the current wait times for responses. Companies have been quick to identify this as a benefit to sell a paid tier.
The quest to bring down response times (and associated compute costs) will continue to be a high priority for companies supporting generative AI services, enabling more things that used to take a long time (e.g. rendering, transcribing, translating) to become real-time.
The training data itself also needs to become more real-time.
We’ve got used to ChatGPT being oblivious to events after September 2021 and falling back on old school search engines or social media, but the size of the prize in bringing LLMs up-to-date is sufficiently large that I anticipate significant progress in the area in the coming months.
Closer to real-time responses will also be needed to upgrade voice assistants from their current useful, but not-what-we-were-promised, role as a glorified radio alarm clock and transcription service.
Whereas Alexa and Siri can currently only provide an answer if an engineer has programmed a corresponding rule (ELSE: respond with ‘I found this on the web’), LLMs will always provide an answer unless an engineer has created a rule telling it not to.
Add AI agents into the mix and voice + generative AI starts to look like an interesting equation in hands-free scenarios (e.g. in the car, when cooking).
The scan-ability of text responses will continue to give text the edge over voice as a response format in scenarios when you can easily read text on a screen.
Today’s generative AI tools are right at the ‘black box’ end of the transparency spectrum, with even the algorithms’ creators and the AI models themselves in the dark as to what inputs informed a given output (“I don’t have direct access to my training data or know where it came from” - ChatGPT).
However, ‘hallucinations’ are a potentially significant barrier to adoption relative to other, equally flawed sources of information which are better at showing their workings (Google, Wikipedia etc.).
Coupled with the pressure from various industries to find a way of crediting contributors, it’s likely we’ll see more generative AI tools which provide sources/references, even if they have to be bolted on after the fact.
Want more posts like this delivered direct to your inbox? Subscribe (for free) here.
Could your organisation use some help making sense of AI? Get in touch.