Artificial Interlocutors

Why Advanced Voice Mode + Memory marks the beginning of a shift in our relationship with AI assistants

Oct 08, 2024

AI-generated image of an animated iPhone talking to a man with a moustache

Over the last couple of weeks ChatGPT Plus and Teams subscribers in the UK have been given access to Advanced Voice Mode via the ChatGPT mobile app.

First demoed in OpenAI’s Spring Update, Advanced Voice Mode enables users to have more natural voice conversations with ChatGPT.

With ChatGPT’s previous Voice Mode, users’ voice input was first transcribed, then fed to a model which responded with text which then had to be converted to audio. As well as taking around 3-5 seconds, this process didn’t allow the model to take account of the speaker’s tone or emotion and meant it struggled with multiple speakers or background noise. This resulted in stilted, stop-start exchanges that will be familiar to anyone who’s tried conversing with a 1st generation voice assistant (e.g. Alexa, Siri, Google Assistant).

Advanced Voice Mode takes advantage of GPT-4o being natively multimodal, bringing response times down to a couple of hundred milliseconds - similar to human response times in a conversation - and enabling a more fluid exchange. You can now make affirmatory noises without the AI assuming it needs to stop talking, or more obviously interrupt and it will pipe down.

Some of its capabilities have been reined in. It’s under strict instructions not to impersonate real people. It also now refuses to sing (a skill we saw demoed in the Spring Update). However it will happily attempt accents.

Here’s an example conversation:

1×

0:00

-1:49

Another feature OpenAI has been slowly rolling out to paid subscribers is Memory.

First announced in February and rolled out in the US in April and the UK in September, Memory helps mitigate one of LLM’s biggest shortcomings - they are not constantly learning.

Once a model is trained and released into the world it is no longer taking in new information.

Yes, RAG might be used to furnish its response with more up-to-date information than exists in its training data, but the underlying model remains unchanged.

Don’t believe me? Login to ChatGPT, click on your profile icon, select Customize ChatGPT and uncheck ‘Browsing’ under GPT-4 Capabilities. Now start a new chat and ask it who the current Prime Minister is (it will say Rishi Sunak, as GPT-4o’s training finished in October 2023).

RAG has proved a reasonably effective mitigation for LLMs’ frozen-in-time knowledge about the world, but doesn’t tackle AI chatbots inability to remember anything from your previous conversations.

OpenAI’s first attempt to mitigate ChatGPT’s amnesia was Custom Instructions. Introduced in July 2023 the feature enables you to provide ChatGPT with 1,500 characters on what you would like it to know about you to provide better responses and another 1,500 characters on how you would like it to respond.

I’ve found Custom Instructions useful in getting ChatGPT to respond less verbosely (one of my instructions is ‘Be succinct’), to give prices in sterling rather than dollars and to generally factor my knowledge and experience into its responses, but 1,500 characters isn’t a lot and everything you want it to remember has to be manually added.

In November 2023, OpenAI launched Custom GPTs, which enables users to effectively create their own variants of ChatGPT (known as GPTs), tailored for a specific purpose. The functionality includes being able to upload files for the GPT to treat as a knowledge store.

This enabled me to create Dan Taylor-Watt Bot, which has read my blog posts (including those beyond GPT-4o’s October 2023 knowledge cutoff) and can tell me what I’ve previously written on any given topic.

It also enabled me to create a School Tracker GPT, which digests the lengthy PDF newsletters my daughter’s school sends out each week and informs me what I need to remember.

Handy, but it still requires manual setup and me to invoke the relevant GPT. ChatGPT’s default responses remain generic.

Until 5th September that is, when I got the below, slightly disconcerting, notification that ‘ChatGPT now has memory’.

Screengrab of ChatGPT notification informing me 'ChatGPT now has memory'

Since then, it’s been remembering key bits of information I’ve shared with it to make its responses more personally relevant.

You can review the nuggets it’s saved and delete those you’d rather it forget.

Advanced Voice and Memory are both useful features on their own but are even more compelling in combination.

Being able to chat in a natural fashion and have your artificial interlocutor both reflect its knowledge of you and take in new information, which it will subsequently remember, feels pretty different to what’s gone before.

Here’s a couple of minutes of ChatGPT and I shooting the breeze to give you a feel:

1×

0:00

-2:06

What ChatGPT does and doesn’t remember from conversations is still a bit hit and miss (see also: humans) although explicitly asking it to remember something tends to work.

Last week, Microsoft updated Copilot with its equivalent of Advanced Voice, although it’s less sophisticated and more prone to mishearing/misunderstanding (Me: “What should I call you?” Copilot: “Lamb hotpot. Unbeatable combo of tender lamb and root veggies” 🤷‍♂️)

Despite the imperfections, the advent of Advanced Voice coupled with Memory feels like the beginning of a fundamental shift in how we interact with AI chatbots and a meaningful stride towards the science fiction vision of artificial assistants/companions we can converse with without having to moderate our way of communicating.

Dan’s Media & AI Sandwich

Discussion about this post