Microsoft Azure cloud platform back after leap year outage
PaaS service running virtually trouble-free worldwide
Tech4Biz | 02 Mar 2012 :
Microsoft's Azure cloud infrastructure and development service was apparently running nearly trouble-free on Thursday, following a series of outages on Wednesday 29 February that affected multiple aspects of the system.
The Azure service health dashboard showed only one problem at 15:00 GMT, a "performance degradation" in the south-central US Compute zone.
"Our recovery efforts to restore compute service to impacted customers in this sub-region are complete," Microsoft said. However, "a small number of customers in this sub-region may face long delays during service management operations," it added.
Azure's service management component fared the worst during the outage, going out worldwide starting at 01:45 GMT. The dashboard showed the service management system running normally at 15:00 GMT on Thursday, as were other previously affected pieces of the Azure platform, including Reporting, Marketplace and Access Control 2.0.
Microsoft provided some insight into the outage's root causes in an official blog post.
"Windows Azure operations became aware of an issue impacting the compute service in a number of regions," wrote Bill Laing, corporate vice president of server and cloud. "The issue was quickly triaged and it was determined to be caused by a software bug. While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year."
"Once we discovered the issue we immediately took steps to protect customer services that were already up and running, and began creating a fix for the issue," he added. "The majority" of customers and services had been fully restored by 10:57 GMT on Wednesday, according to Laing.
Microsoft is planning to provide an update that will include more details on the problem's root cause, he said. "We sincerely apologise for any inconvenience this has caused."
Azure users took to official forums during the outage, complaining of disruptions to their operations and a lack of concrete updates from Microsoft.
The frustration was still lingering Thursday for some users. "I think we could have lost two prospects who are testing our system currently," one wrote on the forum. "But I can't imagine the damage this has done to companies with large scale customers. I mean we have chosen Windows Azure due to the redundancy... How can we explain this to our customers."