BA IT outage shows hacking not the only enterprise risk
31 May 2017 | 0
It was a busy holiday weekend, and the British flag carrier was forced to ground all flights out of London’s two main airports, Heathrow and Gatwick, which affected the airline’s operations around the world, with knock-on effects for call centres and online booking sites, making the situation even more frustrating for stranded passengers.
Most operations have now been restored, the airline says, but more than 1,000 flights were cancelled and 75,000 passengers stranded.
But the problems were not due to some evil cyberattack or ransomware assault. Instead, it was just another “global IT system failure,” reportedly British Airways’ sixth such incident in the last year alone.
Power issue blamed
As the chaos continued, British Airways CEO Alex Cruz laid the cause on a relatively short power surge that was so strong it affected the ability of a back-up system to start properly, for several hours, possibly affecting data sychronisation. Cruz declined to say where problem was located, and the BBC reported that Cruz is resisting calls to resign over the incident.
According to The Independent, the airline is still working to figure out the precise causes, but some are blaming a common combination of modern and legacy technologies. According to The Register, British Airways maintains two large data centres near its Heathrow headquarters.
Meanwhile, the GMB union, representing much of British Airways’ IT staff, blamed the problem on the company’s recent outsourcing of many IT functions to Tata Consultancy Services in India. According to The Sun, sources claim the airline’s back-up system “failed to take over when the primary [IT system] failed due to a power cut,” and the problem spiralled out of control because “inexperienced staff in India didn’t know how to kick-start the airline’s back-up system.”
The airline responded, saying, “We would never compromise the integrity and security of our IT systems.” That sounds great except that, one way or another, those systems’ integrity has obviously been compromised. And Cruz, meanwhile, claimed the incident had nothing to do with cost cutting: “They’ve all been local issues around a local data centre, which has been managed and fixed by local resources,” he told Sky News.
Perhaps not, but the incident points to the often-overlooked human factor in disaster recovery systems. It is not enough to the have the equipment in place to deal with outages and other incidents. It is not even enough to develop plans to deal with problems. It is critical to make sure key personnel are knowledgeable and practiced in actually implementing those plans.
That usually means regular failure and back-up testing, with game-day simulations of all the things that can possibly go wrong. To be fair, it is not clear to what degree British Airways has such measures in place, but it seems clear that whatever the airline was doing, it was not enough. And it is not hard to imagine that the complex international nature of the airline’s IT systems and staffing did not help.
Again, according to The Sun, sources close to British Airways are calling the situation “appalling.” With one saying, “No major company should be in a position where 300,000 of its customers were stranded, with little to no information. … It is a shambles. Heads should roll for this.”
Given the magnitude of the problems, and an estimated £100 million (€115 million) compensation bill, it is quite likely IT leaders could lose their jobs over the weekend’s events. And if that is not enough to spur an immediate and careful review of company disaster recovery plans and procedures around the world, it would be hard to know what is.