The AI perfection expectation

Why the fallibility bar is higher for machines than humans

Jan 17, 2024

A chrome humanoid robot wearing a black t-shirt with white text reading ‘Rage Against the Machine Learning’. Image generated by Dan Taylor-Watt using DALL-E 3

Just before Christmas, autonomous vehicle pioneer Waymo released data from 7.1 million fully driverless miles which indicates that their self-driving cars are materially less likely than human drivers to be involved a crash resulting in an injury or a police report. It mirrors the findings of a number of other studies.

Also in December, King’s College London shared a study which concluded AI trained on 2.8 million historic chest X-rays was “just as accurate or more accurate than the doctor’s analysis at the time the X-ray was taken for 35 out of 37 conditions (94%)”.

Other recent research by Imperial College London showed AI detecting 5-13% more breast cancers than radiologists.

And last Friday, Google shared news of its Articulate Medical Intelligence Explorer (AMIE), which outperformed primary care physicians not just in diagnostic accuracy, but also in terms of empathy.

And yet confidence in self-driving cars is trending down rather than up and most people say they would still choose a human doctor over an AI for diagnosis and treatment.

Whilst self-driving cars and AI-driven medical diagnoses aren’t yet widely available, I anticipate resistance to both will remain stubbornly high, even once the technologies are more mature and the statistics more definitively and dramatically in favour of AI-led approaches over the 100% human-powered alternative, for the following reasons:

1.) Scale of impact

An irresponsible or incompetent human driver can have a devastating impact on a relatively small number of people in their immediate vicinity before they are relieved of their licence. A flaw in the software guiding thousands or millions of autonomous vehicles or medical diagnoses has the potential to have a far greater impact before it is detected and corrected.

There were five months between the crash of Lion Air Flight 610 and Ethiopian Airlines Flight 302 after which the Boeing 737 MAX was grounded and the common cause of the crashes (a flaw in the flight control system) identified. Whilst no less tragic, the human-caused crash of Germanwings Flight 9525 was limited to a single plane.

Similarly, the horrific murders committed by rogue medical practitioners such as Lucy Letby and Harold Shipman are ultimately limited by the number of patients they come into contact with. Widely-deployed AI medical systems have the potential to cause (or fail to prevent) countless deaths in the future.

2.) Automation bias

Key to mitigating this risk is ensuring humans remain not just in the loop but in the driving seat, awake and with their hands firmly on the wheel.

There is a growing body of evidence that AI can trigger automation bias, which has sadly found a new poster child in the shape of the British Post Office scandal (aside: what a splendid demonstration Mr Bates vs The Post Office is of the positive power of public service media. It’s a programme that would have been highly unlikely to have been commissioned by one of the global streaming giants).

In his research paper ‘Falling Asleep at the Wheel: Human/AI Collaboration in a Field Experiment on HR Recruiters’, Fabrizio Dell’Acqua concludes that “As AI quality increases, humans have fewer incentives to exert effort and remain attentive, allowing the AI to substitute, rather than augment their performance.”

Without the right mitigations, it appears increased adoption of better AI is going to lead to more screw ups as a result of humans napping at the wheel, which in turn will undermine confidence in AI, particularly in high-stakes domains like healthcare and transportation (side note: Elon Musk and Tesla have a lot to answer for here with their liberal use of the beguiling but bullshit phrase ‘full self-driving’).

3.) Culpability

When serious mistakes are made, it’s human nature to want to apportion blame. AI systems, with their many authors, moderators and end-users, their mysterious inner workings and their increasing ability to self-improve, make apportioning blame a challenge.

We want a single, wringable neck, not a faceless machine to rage against.

4.) Forgiveness

As well as a neck to wring, it appears we also need a human face to forgive. Research by César A. Hidalgo, author of How Humans Judge Machines, found that “people take a consequentialist approach to judging machines, wherein intent is irrelevant, but not to humans.” concluding “People judge humans by their intentions and machines by their outcomes.”

5.) Expectations of perfection

Not only are we less inclined to forgive machines their misdemeanours, we also have higher underlying expectations of their reliability. In fact, our baseline expectation of machines is perfection. We expect them to accurately and unwaveringly complete the tasks assigned to them. We don’t cut them some slack because they might be tired or hungry or stressed or distracted or confused, as we regularly do with humans.

Consequently, we react strongly to the news of a self-driving car critically injuring a pedestrian, whilst a human driver critically injuring a pedestrian is so commonplace as to not be newsworthy (the WHO estimates 1.19m people die each year as a result of road traffic crashes, with an additional 20-50m suffering non-fatal injuries).

6.) Resistance to change

Another factor is good old resistance to change, with a perhaps predictable variation by age demographic. A recent poll by a Californian health product and service testing company (so, pinch of salt on the absolute percentages) found 82% of Gen Z respondents would trust an AI diagnosis over a human doctor, compared with 57% of Baby Boomers.

7.) Fear of loss of control

Underlying some of that resistance is a fear of losing control and the potential impact of automation on people’s jobs and sense of purpose.

Maintaining a very high bar for the fallibility of machines - far higher than the bar we set ourselves - helps assuage some of those fears, or at least kicks the can further down the road.

Of course, driving and healthcare are both high-stakes domains. We appear to be much more willing to accept the imperfection of machines in return for greater convenience in other domains. Yes, ChatGPT may hallucinate as frequently as a ketamine addict, but it’s still a huge boon to a wide range of tasks.

It seems likely that generative AI will start to shift some of our long held views on the predictability of machines, as its design - inspired by our own neural networks - results in less binary outcomes and more shades of grey.

I still don’t think I’m ready to forgive Clippy though…

Dan’s Media & AI Sandwich

Discussion about this post