From a customer perspective, part of the appeal of the cloud is the ability to focus internal resources directly...
on solving business problems instead of the minutiae of providing a substrate on which those products and services run. As a consequence of this shift in focus, it can sometimes be easy to overlook potential issues that occur at the lower levels of the application stack.
For example, over the past 16 to 24 months, a number of virtual machine (VM) escape issues have occurred of which customers might not be aware, including VMware SVGA driver issues (CVE-2017-4903), Hyper-V issues (CVE-2017-0109) and, in particular, a number of VM escape issues in the Xen hypervisor. Xen has had a number of issues related to paravirtualization in the past few years, such as CVE-2016-7092, CVE-2017-8903, CVE-2017-8904 and CVE-2017-8905, most recently.
What this means in practice is that, regardless of the hypervisor product you or your cloud provider employs, chances are pretty good that a potential VM escape issue has occurred that impacts your internal virtualization infrastructure or a cloud environment you employ -- and potentially more than one. This matters because, even though the issue itself impacts a lower level of the stack than the customer has direct control over, the potential risk and impacts can propagate all the way up to your application, your data or the business logic upon which processes in your organization depend.
This means that, to ensure that an organization's cloud security posture is as robust as possible, security practitioners should account for VM escape issues in their overall threat model. This is not often the mindset that practitioners have, as they often assume that attacks against the virtualization segmentation model are impossible -- or so infrequent as to not warrant active consideration.
In recent years, VM escape vulnerabilities, though still rare, have occurred. While practitioners can't always control what goes on under the hood in their cloud providers' environments, they can take steps to help minimize the potential impacts if they consider the possibility of a VM escape at the outset.
To unpack how practitioners can build in resilience for this type of issue, it's important to first understand the specific issues that can arise. With that in mind, let's look in depth at Xen paravirtualization (PV), what it is and how issues can arise. This is useful to keep in mind as we build strategies to mitigate VM escape impacts at the level of the application, which we do have control over.
An easy way to understand PV is to view it through the lens of its difference from full virtualization and how it emerged in virtualization history. As most know, early VMs were designed to fully emulate the underlying hardware running an OS. From the point of view of the OS running on a hypervisor, the VM looked exactly like a physical device.
It was, in turn, the job of the hypervisor to seamlessly emulate any service that underlying physical hardware might provide. This includes everything from the BIOS to hardware interrupts, physical hardware, memory layout and everything else, all the way up to the OS.
As virtualization rose to prominence though, those working in the space realized that performance and efficiency gains could be attained if the OS became an active and willing participant in the virtualization process. This means the operating system is designed with virtualization in mind, and that it is able to work collaboratively with the hypervisor instead of assuming it's running on a bare-metal platform.
By doing this, the OS shifts from a model that requires the hypervisor to emulate any hardware to one where the guest operating system leverages drivers or APIs that interface more directly -- and more efficiently -- with the underlying hypervisor OS. The downside of this approach is that only those operating systems that are virtualization aware -- meaning developed and compiled with virtualization in mind -- can do this. The upside is better performance and better, more optimized resource management.
However, from an engineering perspective, the PV approach is arguably more difficult to secure. Why? Because the guest OS is knowingly and purposefully interfacing with the hypervisor OS without going through an abstraction layer of emulated physical hardware. In addition to the well-defined sandbox with strict boundaries enforced by the emulated hardware platform, the guest OS also has access to interfaces that communicate with the hypervisor OS more directly. In that model, both the emulated sandbox -- and also these more direct interfaces -- need to enforce the segmentation model. With that larger surface comes additional avenues where issues might arise.
Insulating against VM escape
From the point of view of a practitioner, VM escape issues seem almost insurmountable to address. In a multi-tenant infrastructure-as-a-service situation, the possibility of VM escape is clearly a potential problem, which is compounded by the fact that the customer has little to no control over -- or even visibility into -- the hypervisor, and has little to no ability to deploy countermeasures at levels of the stack below the workloads it fields.
Despite this, there are some protections that we can take to help offset a potential VM escape. First of all, should an attacker be able to run arbitrary code on one of our OS instances, there's really nothing we can do to fully protect against that. We can and should implement layered defenses to make it as difficult as possible for them, but at the end of the day, the first immutable law of security still applies: "If a bad guy can persuade you to run his program on your computer, it's not ... your computer anymore."
With that in mind, some things can help. First, robust logging -- particularly logging that is exported off the guest OS to a central repository -- can provide value in identifying an issue should it arise or tracking it down afterward. Tying threat intelligence data directly to that logging information can help you spot potential situations earlier than might otherwise be the case.
Additionally, encryption of data can provide some value. Yes, should the guest OS be fully compromised, it's possible the attacker can locate and defeat whatever encryption methodology you have in place -- since, after all, the guest OS has to encipher or decipher the data to store, process or transmit it in the first place. However, in a target of opportunity situation, this speed bump might be enough to cause them to look for a softer target elsewhere.
Probably the most important step you can take is to actively include the possibility of a VM escape in the predeployment planning that you do for new applications or applications that you are transitioning to the cloud. If you actively threat model applications before you field them, include VM escape as something that you explicitly evaluate in that analysis. If you do a periodic risk assessment, think through VM escape as part of that exercise.
It may not always be possible to fully or even partially address a VM escape situation, but thinking it through ahead of time can help you find areas where you can minimize the damage, slow an attacker down or otherwise insulate what you deploy into the cloud.
Find out which OSes include paravirtualization support
Learn more about the risks of VM escapes
Discover how to use NIST guidance to help with hypervisor security