Lessons from Netflix and Twitter’s live streaming fails
Live streaming at scale is hard. What to do when it all goes Pete Tong?
People tend to underestimate the challenges of live streaming at scale.
Twitter yesterday experienced repeated technical issues with its live audio stream of Ron DeSantis’s 2024 presidential campaign launch on Twitter Spaces.
And last month Netflix had major issues with its second foray into live streaming, the Love Is Blind Reunion special.
Co-CEO Greg Peters gave this explanation on Netflix’s quarterly investor call:
“I would start by saying we're really sorry to have disappointed so many people. We didn't meet the standard that we expect of ourselves to serve our members. And just to be clear, from a technical perspective, we've got the infrastructure. We had just a bug that we introduced. Actually, when we implemented some changes to try and improve live streaming performance after the last live broadcast, Chris Rock in March, and we just didn't see this bug in internal testing because it only became apparent once we have put sort of multiple systems interacting with each other under the load of millions of people trying to watch Love Is Blind. So we hate it when these things happen, but we’ll learn from it and we’ll get better“.
Elon Musk gave a less expansive, less apologetic and more Trumpian explanation for the Twitter Spaces issues: “Twitter new account signups just went ballistic…I call it ‘massive attention’. Top story on Earth today”.
Whilst some commentators have been quick to decry both companies competence, the reality is that live streaming at scale is hard and any issues are amplified by the number of people simultaneously clamouring for that content.
I vividly recall the moment that BBC iPlayer started struggling during the final minutes of England’s quarter-final match against Sweden in the 2018 World Cup. I was in my garden with a big group of friends looking at a buffering wheel on a large projector screen and praying the issue was with my wifi (it wasn’t).
Even when a live stream is working as expected, there’s latency* to contend with (*the delay introduced by encoding, distributing and decoding the media). Not a problem with some live content, but a real issue for sport which is also being distributed over broadcast networks (which have much lower levels of latency), where a cheer from next door a minute ahead of a crucial shot being taken can quite literally give the game away.
So, should streaming services throw in the towel on live and leave it to broadcast? No, but as well as doing everything they can to preempt potential issues, streamers should also have a clear game plan for what to do when things do go wrong.
Here are six recommendations of things to do in preparation for and in response to issues with a live stream:
1.) Make it someone’s sole focus to communicate out from the engineering teams to the external comms/social media/customer support teams
You want your engineering teams focused on diagnosing and fixing the issue, not trying to explain it to various different teams. This point person can be from any number of different disciplines but needs to be able to translate the technical back and forth on Slack into a clear message for those managing the comms to the public and the press.
2.) Scenario plan the various potential issues and consider the product experience and messaging in each
Whilst post-mortems/post incident reviews are now standard practice in most companies streaming at scale, pre-mortems are, like accessibility and analytics, often something that gets skipped or compressed in the run up to a major product release.
Push back the deadline and/or reduce scope to make sure you’ve got enough time to work through how the product experience will degrade in various scenarios and what levers and dials are needed to prevent a domino effect and to recover as quickly as possible.
3.) Acknowledge the issue quickly
As soon as you have validated that some users are experiencing an issue then acknowledge it on social channels and in response to direct contacts. Even if it turns out to only be a small number of users affected, its better to get on the front foot with acknowledging the issue.
Users who know that an issue has been clocked and is being actively addressed are less angry than those whose angry tweets appear to be falling on deaf ears.
4.) Avoid the temptation to speculate on the root cause or give ETAs for resolution
It’s extremely common to identify a number of red herrings on the way to isolating the true root cause(s) of an issue. Avoid the temptation to communicate unvalidated theories, which you later have to revise/retract.
It’s also worth trying to avoid giving estimates for when issues will be resolved as a) fixes can take longer than expected b) something else can always go wrong.
A couple of minutes after Love Is Blind Reunion was due to start, Netflix tweeted “Love is … late. #LoveIsBlindLIVE will be on in 15 minutes!” Only it wasn’t.
5.) Once you have established the root cause, be as transparent as possible about it
Whilst there are sometimes legitimate reasons for not being 100% transparent about issues (e.g. spotlighting security weaknesses), the more open you can be the better when it comes to managing user frustration.
Greg Peters’ explanation to investors did this well.
6.) Apologise
It’s really frustrating when live streams fail. Acknowledge that and apologise. Netflix also did a pretty decent job of this. Twitter, not so much, although I guess an apology was always unlikely to be forthcoming from a company which auto replies to press enquires with a poo emoji…
Twitter seems to still be firmly in the “no publicity is bad publicity” camp. Unfortunately that dictum doesn’t hold true infinitely.