Facebook’s Sept. 23 outage was caused by a software flaw that crippled its database clusters, the company confirmed. The downtime was the worst at the social network in four years.
The outage that darkened Facebook for two and a half hours Sept. 23 was caused by a software flaw in its database clusters, the company confirmed.
Facebook went down—the company called it the “worst outage we’ve had in over four years”—around 1:30 p.m. EDT Thursday and didn’t go back up until 4 p.m. EDT.
Some of the 500 million-plus Facebook users tweeted about the event on Twitter, wondering what they would do without access to their photos, links, videos and other content they shared on the massive social network.
“The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition,” said Robert Johnson, director of software engineering at Facebook, in a blog post.
“An automated system for verifying configuration values ended up causing much more damage than it fixed.”
One fault cascaded into many, with Facebook having to halt traffic to the failing database cluster. The company slowly allowed users to re-enter the Website.
The company turned off the automated system that handles correction values and is looking to pattern this configuration system after other systems at the company.
“We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously,” Johnson concluded.
via Facebook Outage Triggered by Database Software Error – Web Services Web 20 and SOA from eWeek.