Following a series of unexpected reboots at several high-profile cloud computing service providers, cloud security experts warn that enterprises should prepare now to mitigate the effects of more security-related service disruptions in the future.
Two weeks ago, Amazon Web Services Inc. notified its customers of an EC2 "maintenance update" that required a reboot of about 10% of its hosts globally. The reboot applied a security update that corrected a flaw in the open source Xen hypervisor, which Amazon uses in its cloud architecture; the host servers required a system restart that rendered them unavailable "for a few minutes" while the patches were being applied, according to a blog post from Jeff Barr, chief evangelist for AWS.
The cloud giant successfully patched the flaw and rebooted the EC2 hosts over the course of several days, staggering the reboots so that no two regions or availability zones were affected at the same time.
"The zone by zone reboots were completed as planned and we worked very closely with our customers to ensure that the reboots went smoothly for them," Barr wrote in another blog post.
The need to address the Xen hypervisor flaw, however, wasn't unique to AWS. The day after Amazon first announced its EC2 maintenance update, Rackspace Inc. announced that it too would need to fix the Xen hypervisor flaw. But in Rackspace's case, the company rebooted its entire public cloud, region by region, to correct the bug. Following the massive reboot, Rackspace president and CEO Taylor Rhodes issued an apology to customers regarding the downtime and inconvenience.
And just last week, IBM SoftLayer also announced reboots for its public and private node virtual servers to apply the Xen hypervisor security update.
The Xen hypervisor flaw itself was relatively minor issue, according to the open source community Xen Project, and had nothing to do with the recent high-profile Heartbleed or Shellshock vulnerabilities. However, the hypervisor flaw would have potentially allowed a malicious virtual server to read data about other virtual machines running on the same physical host server or hypervisor. In addition to exposing data, an attacker may be able to crash host servers.
Are future cloud computing hypervisor issues likely?
Hypervisors, which are also called virtual machine managers, are crucial to cloud architecture because they allow providers to run multiple operating systems on a single host server. In cloud computing, hypervisors are generally considered to be one of the more stable and secure components of cloud architectures.
However, experts say future hypervisor flaws and subsequent forced cloud provider reboots are likely, and perhaps inevitable.
"We're going to have security updates like this to hypervisors, whether it's Xen or KVM or Hyper-V, in the future," said John Burke, principal research analyst at Mokena, Ill.-based Nemertes Research Group Inc. "The question is, 'Will they have this kind of impact?' And I think the answer to that, at least in the next year or so, is yes."
The problem, Burke said, is that many public cloud services don't support live migration, which allows the movement of a virtual machine from one physical host server to another without having to go offline. Google this year introduced live migration for its Google Compute Engine infrastructure as a service, and other leading cloud providers such as VMware plan to support live migration in the near future.
Further complicating matters for enterprises is that live migration is somewhat untested.
"Live migration hasn't been reliable until recently, and is still somewhat of a risk for certain workloads," said Rich Mogull, CEO and analyst at Securosis, a Phoenix-based security research firm. "It's more important for people to architect for resiliency, and understand that the cloud is not business as usual."
Indeed, Amazon's Barr recommended that enterprises make their AWS architecture more fault-tolerant and suggested such practices as running instances in two or more availability zones and using autoscaling to ensure a set number of instances are running.
But even when architecting clouds for resiliency and more fault tolerance, Burke said, enterprises and vendors alike should keep a close eye on their hypervisors to make sure they are properly patched and updated to avoid any security issues.
"It's never a good idea to assume a piece of software is completely finished and won't need updates or patches," Burke said. "There may be a broad perception that hypervisors are more secure than other types of software, and with some justification. But that notion has bled into thinking that hypervisors are secure, and that's just not the case."
Mogull said forced reboots like these "rarely happen," and major hypervisor patches like the Xen security update have only occurred a small number of times over the last five years. But he expects the reboots will lead additional cloud providers to support live migration as a competitive advantage.
Meanwhile, cloud-focused solution providers are preparing for the next potential hypervisor security update and potential cloud reboots. Matt Johnson, co-founder and CEO of Raven Data Technologies, a solution provider based in Reisterstown, Md., said his company uses Microsoft's Azure, which uses Microsoft Hyper-V virtualization software and therefore was unaffected by the Xen hypervisor flaw. But he said he's taking the cloud reboots as a serious matter that could affect his business and his customers' cloud operations in the future.
"We began using Azure pretty heavily in the last year, so we were lucky to avoid these issues," Johnson said. "But we're going to keep a close eye on [hypervisor security]. If it happened to Xen, then it can happen to others."
Learn how to use VMware ESXi hosts for sandbox testing