You know the feeling: that weight of dread in your gut as you realize you have a problem -- a big one. You've been
breached. You’re under attack, or the system is down, or there's a new critical vulnerability there isn't a patch for that leaves production systems wide open. Whatever the cause, it's time to kick off your incident response process before things get worse.
Then that weight in your gut implodes as you realize the affected systems are in the cloud. You don't have network logs because you don't own the network. Or the outage is with your provider, and you can't simply hit reboot. Or you're using your cloud provider's nifty pre-packaged stack and don't have root access to install a patch. Whatever the reason, your incident response just became a whole lot more complex.
The adoption of cloud computing significantly alters the very fabric of incident response. Similar to many kinds of outsourcing, you give up control over the physical infrastructure, and as a result, your response processes suddenly becomes much more complex, as new participants -- possibly with competing priorities -- join the game. And yet even with traditional outsourcing, you typically owned at least some of the infrastructure you deployed within it; but with the cloud, systems and data may be scattered all over the map on systems you share with other customers of your cloud provider.
Even if you are deploying an internal, private cloud, the response process changes since you've likely comingled systems and resources in ways that dramatically complicate the process. How do you perform a forensic examination of a server when it's a VM running across shared hardware?
I’m breaking my recommendations into two sections: What to do before you move to the cloud (or what you should have done yesterday if you're already in there), and what to do during an incident.
Preparing your incident response process
In an ideal world, I'd tell you to never allow a cloud deployment without meeting with your provider's incident response team and mapping out a joint response process for incidents. For some of you, this might be a realistic first step, and as part of your contract, you want to lay out roles and responsibilities, contact numbers, backup contact numbers and secure incident communications channels. Before deploying your application/systems/data, make sure you have a clear understanding of how incidents will be managed, including what kicks off the incident response process in the first place. All this needs to be written into service-level agreements (SLAs) backed by shared financial responsibility if SLAs aren't met.
However, most of you don't live in an ideal world, and while you should strive for such clarity, you might be far more restricted in terms of the information available or contractual protections . Here are a few key points to help you prepare:
* Understand, as best you can, what systems, data and processes are deployed in the cloud so you know where you might have to respond.
* Find out the response processes of your provider, and, if possible, get contact information and at least say hello so you know your email won't end up in the spam filter. Also make sure the point of contact your cloud provider has for your organization knows who you are and how to reach you if they get the call before you notice something is wrong.
* Know what security and monitoring controls you have in the cloud, and if they aren't sufficient, look for how you can close the gap. For example, do you need or have a way to monitor network or application traffic? If you need it and don't have it, can you deploy something and what will be the performance and business impacts?
* For SaaS, you will often be completely reliant on your provider, so make sure you understand its incident response process, what monitoring it has in place, and if that information is accessible to you. Also ask about backups to restore data or operations.
* For IaaS, if you deploy monitoring, make sure it accounts for your cloud architecture. Application logging on an ephemeral instance might not be your best choice.
* Consider whether you need to recover if your cloud provider has an outage (which it will eventually). If so, make sure you have a recovery plan that includes using either internal systems or an alternate provider, collect data that meets your recovery time and point objectives, and have the means to move that data to alternative systems.
* For a private cloud, make sure you understand the response and investigative changes imposed by your use of the technology. Know how you will manage network traffic to shared resource pools, handle forensics, collect logs, update systems, and respond to problems with the cloud infrastructure itself.
* Develop specific response plans for any major systems/applications deployed in the cloud.
* Put a monitoring infrastructure in place to know when to trigger an incident. This might be as simple as some scripts to ping for outages, or as complex as detailed log monitoring.
Steps to take during an incident
With so many different cloud deployment options out there, you can't cover all incident response process considerations, but here are some suggestions for what to do during an incident:
* Engage your cloud provider's response team sooner rather than later. If you don't have access to them, don't sit around waiting for them to call-- start doing whatever you can to contain and manage the incident.
* If you can't control an incident in the cloud, you may need to shut down those operations and move them internally. For example, if you are using hosted email and there's a breach at your provider and they are unresponsive, you may need to set up a temporary internal server or alternate cloud service. If you don't have this option, and you don't have any ability to escalate with your provider, understand that you might be completely hamstrung. That's why the planning is so important.
* In a joint response, aggressively focus on communications between the incident response teams. This is hard enough when only your own people are involved, never mind a third-party cloud provider’s team you’ve never worked with.
* Leverage the cloud. If, for example, you have an IaaS breach, you need to monitor network traffic and work with your internal cloud folks to rapidly spin up a network proxy and reroute traffic. It might slow performance and be expensive, but if you need it, you need it.
The most important point to remember is if you don't plan ahead of time, the odds of successfully handling an incident are very low. Even if you are operating under the tightest restrictions with no ability to engage with your cloud provider, it's far less of a problem if you plan for that and know what you need to do to keep the business running.
About the author:
Rich Mogull has nearly 20 years experience in information security, physical security, and risk management. Prior to founding independent information security consulting firm Securosis, he spent seven years at Gartner Inc., most recently as a vice president, where he advised thousands of clients, authored dozens of reports and was consistently rated as one of Gartner's top international speakers. He is one of the world's premier authorities on data security technologies, including DLP, and has covered issues ranging from vulnerabilities and threats, to risk management frameworks, to major application security.