A year ago, I posted a video of my AI-generated avatar.
In it, my avatar (reading a script I’d prepared) joked that the next time you were on a Zoom call with me, I might be asleep and you may be conversing with him instead.
Last week, HeyGen released a new Interactive Avatar feature that makes that unsettling scenario much more possible.
In addition to approximating your physical likeness and voice, your Interactive Avatar can be connected to a Large Language Model, such as ChatGPT or Claude, enabling it to respond in near real-time.
You can add additional information and instructions you want your avatar to take account of via a Knowledge Base (similar to ChatGPT’s Custom Instructions).
And you can invite your Interactive Avatar to a Zoom call.
Below is a quick exchange between me and my newly minted Interactive Avatar.
There’s obviously no shortage of telltale signs that it’s an AI (made even more apparent by being intercut with actual footage of me):
The voice isn’t a great likeness and wouldn’t fool anyone who knows me (although I appreciated the AI's attempt to explain it away as the result of a cold)
There’s a slight lag in the AI responding (although this could probably be blamed on slow broadband).
The lip syncing is imperfect (although worse in this upload than on the original video)
Some of its dialogue is very obviously AI-generated (although much of it isn’t - favourite line: “But, you know, human brain processing can be a bit slow sometimes”)
Of course, all of these things will improve to the point that they won’t give the game away and we’ll then be wholly reliant on products signposting the artifice of these avatars.
HeyGen does that through a visible watermark and appending ‘HeyGen Interactive Avatar’ to the end of the avatar’s participant name. However, less responsible AI companies and open-source equivalents won’t.
We may have already started to adjust to the fact that we can’t be sure that the text, images and on-demand videos we encounter haven’t been altered or wholly generated by AI. Having to question the authenticity of live video is going to be another big adjustment.
One of my AI predictions for 2024 was that LLMs will increasingly be used in combination with other tools. Another was that more of us will be using an AI as an assistant, delegate and/or companion. Plumbing video avatars into LLMs is going to supercharge the burgeoning AI companion space, which is already in a different league when it comes to time spent.
I liked the way you said "nice chatting to you" and it replied in American "nice chatting with you"