At 1,000 miles wide and with sustained winds up to 90mph, hurricane Sandy was a storm for the record-books. Millions were left without power, and early cost estimates of the storm have been pegged at about $50 billion.
We saw several large webhost providers switch to alternate locations when their primary data centers went offline in New York City.
Kevin O’Shea, information security practice lead, URS Corp.
The storm certainly proved as dangerous as the hype leading up to it, and would place the business continuity and disaster-recovery plans of any organization in its path to a test unlike any other.
And that it did. And while many businesses proved ready, many also didn’t. Some businesses that failed were not critical, such as the online retailer Fab, whose website remained up, but advised customers in an email that it would take time before product shipments would resume. However others were crucial, including numerous hospitals that suffered significant power disruptions. “We were able to still care for our patients, but things slowed down considerably because we couldn’t access a core electronic medical record system for several days,” said the admin of a hospital based in northern New Jersey, whose organization refused to permit him to be quoted by name.
The reaction to how well businesses fared with their disaster recovery plans has been mixed. Some have cited numerous failures, while others have lauded the efforts of both private and public sectors in dealing with the storm, which will be ranked among one of the nation’s worst. “The story is mixed here,” said Stephanie Balaouras, an analyst at Forrester Research. “On the one hand, I've seen a number of large enterprises, especially those in financial services, really handle the storm well. In advance of the storm, they shifted operations to other locations, proactively closed offices and encouraged employees to work from home or alternate locations,” said Balaouras.
However, others lost marks, especially when it came to loss of electrical power preparedness. “The issues with generators is maddening. If you look at the Japanese tsunami and nuclear crisis, the nuclear crisis itself could have been avoided if the generators were not on the ground. The entire world should have learned from this. In addition to some flooded generators, some generators simply didn't function. I find over and over again, that people don't test them enough,” Balaouras added.
While the response is still underway, a number of additional lessons are to be learned, including how organizations need to plan for the reality of compounding system failures. “We will be watching and learning as much as we can from the incident. Right now, it looks like many simply didn’t contemplate their disaster recovery plan not working,” said Martin Fisher, director of information security at WellStar Health System. "If the power goes down, and the generator fails, what do you do? Do you need a backup generator plan? Some industries like healthcare do,” said Fisher.
Kevin O’Shea, information security practice lead at engineering, construction and technical services firm URS Corporation, agreed. “It appears that some institutions kept their threats siloed and separate: No power, no problem, we have generators and 10,000 gallons of diesel in the basement. Flooding? No problem, we have electric pumps in the basement. But what if the basement floods faster than the pumps can clear it and then you lose power? Risk assessment can get very complicated when looking at multiple events occurring at the same time. Therefore, many assessments will not look at more catastrophic scenarios where 2, 3 and 4 threats converge during one event,” said O’Shea.
Cloud makes things easier operationally to recover, because the heavy lifting is done by the provider. However, it’s harder procedurally because you have to integrate that cloud services provider ... into your BC/DR planning.
Martin Fisher, director of information security, WellStar Health System.
O’Shea said that URS proposes clients employ an “effects-based” planning method that aims to examine the potential outcome from many different types of threats. “For example ‘Loss of Power’ can be the effect for a wide range of natural and manmade threats. Developing a plan to deal with the effect isn't tied to a specific threat. In this way, we can directly manage the risks to the business related to the loss of power independently of managing the risks related to a specific threat that could cause a loss in power. We believe this is an effective way to begin bridging the threat silos that can be created during a threat/risk assessment,” he said.
Cloud computing aided resiliency
While experts we interviewed said that lessons have been learned since both September 11, 2001 and hurricane Katrina - since those two events much of business IT has been spun by the advent of cloud computing and highly virtualized systems. This has been for outcomes both good and bad.
“Overall, I think cloud does help,” said Balaouras. “Tier one cloud and SaaS providers such as Google and Amazon operate their cloud services from multiple data centers and can simply shift workloads to other locations as needed. They are also able to deliver a level of availability that many organizations could never achieve themselves. This includes the resiliency of the data center infrastructure itself to the resources that they invest in high availability and disaster recovery capabilities,” said Balaouras.
“Cloud computing can absolutely help in BC/DR operations,” said O’Shea. “For example, we saw several large webhost providers switch to alternate locations when their primary data centers went offline in New York City. However, businesses must be organized in such a way as to be able to offload critical applications and data to a cloud provider,” he said.
O’Shea added that virtualization, identification, and segregation of critical business processes are important steps that get organizations prepped for cloud services. “But rarely do we see businesses that have identified all their critical business processes, identified the critical cyber assets that support those processes, and virtualized and replicated key servers and data. Like many other problems in information security and BC/DR, gaining significant leverage from cloud services is not solely a technology issue; it involves a coordinated effort across the business processes, infrastructure and culture,” he said.
Part of the issue there is the very way people approach cloud services themselves, said Fisher. “It’s a two edged sword. Cloud makes things easier operationally to recover, because the heavy lifting is done by the provider. However, it’s harder procedurally because you have to integrate that cloud services provider, who isn’t under your direct control, into your BC/DR planning. This adds more complexity,” he said.
One of the biggest issues with cloud computing when it comes to BC/DR however, is also one of the most solvable. “People totally forget about BC/DR for cloud systems, because they believe they’ve outsourced all operations, and they then have no idea what to do when disaster hits,” said Fisher.
That’s why it’s crucial for organizations to not only plan on how they’ll move forward, without any number of their cloud services in the event of a disaster, but to also take diligence to vet the resiliency of their cloud providers. “One thing that worries me is that organizations do not do a thorough job vetting cloud and other third party providers. In fact, I find many organizations have no idea what kind of uptime or resiliency capabilities their providers have - they never asked,” said Balaouras.
Fisher agreed. “Not all cloud providers will be forthcoming with their BC/DR abilities, which is a red flag itself. But you need to ask about their resiliency, and you need to develop a plan for what you will do should they go down,” Fisher said.
About the author:
George V. Hulme writes about security and technology from his home in Minneapolis. You can also find him tweeting about those topics on Twitter @georgevhulme.