You may have already seen the below video in your feed and wondered why it’s got AI heads so excited.
Having seen a lot of humanoid robots in films and TV (it’s 47 years since C3PO made his cinematic debut) it’s easy to focus on the things that aren’t quite there yet: the pauses before responding to questions, the indelicate placement of the cup and plate in the drying rack).
However, what’s most remarkable about this demo isn’t what’s on screen (you could have made this video a few years ago). It’s how it’s been achieved.
Because the robot isn’t following a script or a decision tree. Its movements aren’t being remotely controlled by a human operator. The video hasn’t been edited or sped up.
What’s on display here is a robot seeing, listening, reasoning, responding and manipulating real-world objects in real-time based on patterns its AI models discerned in training data and learning from sensory inputs.
Breaking it down a bit:
👁️ Computer Vision is enabling it to recognise objects in its environment and their position relative to one another.
👂👄 Natural Language Processing (NLP) is enabling it to parse speech and respond in a naturalistic voice (note the vocal fry and the “er” at 52 seconds).
💬 The many transformer layers of a Language Model are furnishing it with plausible things to say, including an explanation of its decision-making and an assessment of its performance.
🦾 Advanced robotics are enabling it to manipulate objects with a high-level of dexterity and precision.
🤹 And it’s multi-tasking (explaining its previous action whilst picking up trash).
It’s the seamless interaction between the different modalities that feels most transformational to me, with visual and sensory inputs feeding the language processing and vice versa.
Released by Figure as a ‘status update’, there’s a dearth of official information to accompany the video.
Timed to coincide with the announcement of a $675m funding round, it’s a tantalising glimpse of the progress being made behind closed doors.
The speed of response will of course improve, as will the precision of movement.
What remains to be seen is whether a humanoid form helps temper our AI perfection expectation. I suspect it might.