The art of incident handling
19 January 2018 | 0
Incident handling is a fine art.
Getting the balance right between what is necessary to address the issues, while simultaneously handling any fall out, be it internal, external or through the media, is critical.
Two very different examples have arisen lately, neither of which, to be honest, are likely to make it into the text books as templates for the state of the art.
The first issue was the Spectre/Meltdown debacle, and the other was the iPhone throttling mess.
“Given the extent of the flaws, the fact that the hardware affected was so widespread and the number of vendors involved so great, it was probably for the best that it was kept a secret until such time as it could be addressed”
Each of these issues was handled very differently, except both only really came to light as a result of external intervention that pre-empted efforts to contain information.
With the iPhone throttling story, it turns out that Apple has been retarding the performance of certain iPhone models as their batteries degrade as they get older. The idea being that Apple did not want the iPhone ‘experience’ to be compromised by shorter battery life, and took the decision to throttle performance instead of risk shorter endurance.
The critical thing about this was that Apple not only conceived of this but implemented it, without either the consultation or consent of users.
This has been widely interpreted as massively arrogant and disrespectful by a manufacturer that has, in the past, indicated that the Apple device owner should always have ultimate control, even where that control might clash with the likes of corporate fleet group policies. Ultimate control, it seems, except when Apple deems otherwise.
Given the fact the iPhone is so popular, there is a broad range among the users of both knowledge and experience in usage. It is perfectly reasonable to assume that a very large cadre of owners would indeed opt for the throttling to benefit from less compromised battery endurance. However, not providing users with the benefit of the knowledge of what was going on, nor the ability to opt out of it, is unforgivable.
Added to the unforgiveable, it turns out that it is now going to be expensive too. Despite the fact that iPhones in recent generations have scored very poorly on tear down tests in terms of replaceable parts, Apple has been forced into offering discounted battery changes for older devices, complicating an already vexed set of options for users. But also, no doubt, counting on the fact that many users will weigh up the cost and the time without their device, in determining whether to go for a new battery for an older device.
Either way, a company that sneak unwanted albums on to millions of devices worldwide, is easily capable of a campaign of information to explain an issue such as natural battery degradation and its effects, providing the option for either a ready-made remediation measure of throttling, or an opt-out for full performance with the hit of shortened battery life. In fact, in recent days, it has been announced that there will be an opt-out option from the throttling measure, revising the earlier efforts.
By contrast, the Spectre and Meltdown issue could not have been handled differently.
When the side-channel memory flaws dubbed Spectre and Meltdown were first identified and brought to the attention of the respective chip manufacturers in June of 2017, the information was kept under tight wraps as the full extent of the issues were examined, and their impact determined.
However, the decision to keep that tight-lipped regime until fixes could be developed and distributed, was arguably, the right one. Given the extent of the flaws, the fact that the hardware affected was so widespread and the number of vendors involved so great, it was probably for the best that it was kept a secret until such time as it could be addressed.
However, when the news broke in late December, early January, there was still much confusion, with different manufacturers making different claims as to the extent of vulnerability to the issues, despite a full six months to have dealt with it.
Furthermore, several of the fixes were obviously not fully tested and there were widespread issues, from AMD PCs being essentially bricked, to Intel PCs going into unexpected reboots. While performance hits were anticipated, what was not expected was for patches to be labelled “optional” due to unwanted side effects, or patches for patches due to the aforementioned.
Lessons to be learned
The lessons here are that everyone needs to be better at incident handling.
Firstly, when something comes to light and there is a secret scramble to fix it, right there and then, there should be a strategy to handle the issue coming to light immediately. And, with every significant development in the remediation process before revelation, that strategy should be updated. Not only that, a decision should be made while the veil of secrecy still holds for when the issue should be revealed. This should be at the earliest possible point in the remediation process, while being cognisant of the risks of revelation.
By doing this, those responsible for the remediation appear to be responsible and considered in their actions, not running scared and taken unawares.
Secondly, and especially where consumers are involved, they should be given due respect for ownership of their own devices. No vendor, irrespective of their perception in the market, should adopt the “don’t worry your pretty little head” approach as it will ultimately bite them—as well it should.
Will Apple, Intel, AMD or ARM ultimately suffer irreparable damage for their poor handling of these respective incidents? Probably not, but have they been tarnished in terms of their public perception, I would argue, undoubtedly.
But we can all learn important lessons from their experiences, and keep a copy of that Jon Ronson book on standby.