Problem solve Get help with specific problems with your technologies, process and projects.

Techniques for sensitive data discovery in the cloud

Tracking data is complex in cloud environments, but there are a number of tools and compliance activities organizations can leverage.

Most information security professionals face a quandary every day.  Namely, the security organization is chartered with securing the critical data of their firms.  But where is that data?  In a large, complex and rapidly changing environment, this seemingly simple question can be effectively unanswerable. 

This complexity derives from the fact that the number of paths data can travel within complex environments is near limitless, and the amount of data in scope can be astronomical. Trying to catalogue and inventory data within even a small or midsized IT ecosystem is worse than looking for a needle in a haystack. This lack of visibility into where data lives and how it travels throughout an organization becomes even more problematic with cloud computing.

In a typical cloud scenario, sensitive data can interface in unexpected ways with external processes and third parties.  Specifics vary by cloud model and usage context, but cloud computing can change where data is stored (and who stores it), pathways data can take, and organizational borders it crosses.  For example, consider a SaaS application deployment where, application data might be migrated in bulk (such as in the form of a database export or other structured data exchange) to a software vendor to bootstrap the application processing. Or in an IaaS model, IT services might move from on-premise data centers to managed virtual data centers,  taking sensitive data with it. 

When characteristics of the data lifecycle change,  in a context where the data lifecycle is only partially understood,  it creates risk.  Data can wind up in an inappropriate environment (for example, credit card data moving to a non-PCI environment) or put in an environment that doesn’t implement required security controls.  That leaves us back at the initial quandary: How can  security be maintained when the dynamics of that data -- the type, flow and location it is stored -- aren’t fully understood?  Let’s look at some methods for sensitive data discovery during and after a cloud migration.

Sensitive data discovery in transit

The first step to breaking free from this unfortunate situation is to attempt to locate and flag sensitive data as it leaves the organizational boundary. But as most security pros know, a full data inventory isn’t usually a viable option for accomplishing that.  However, a targeted inventory comprising a targeted subset of only what’s in transit can be effective.

Imagine a warehouse that’s organized by types of items it stores, but without a full inventory.  If the organization running this warehouse were to buy another warehouse, it would need a strategy to know which warehouse location to look in to find a particular item. It remains prohibitively expensive to inventory every item in the original warehouse, but what about keeping track of only items that leave the warehouse?  What if the organization creates shipping manifests of just items being transferred from the old location to the new?  Then it knows what’s going to the new location and by process of elimination can determine what’s still located at the old facility; it doesn’t need a full inventory to know whether important items were transferred.  This same idea  can be useful for a cloud deployment.

During a cloud migration, outbound data sent to a provider (for example on magnetic storage media or through a structured data transfer) can be investigated to determine if is regulated and/or sensitive.  Automated data discovery methods like data loss prevention (DLP) tools can help, as can open source tools designed to find certain kinds of regulated data (for example, the open source ccsrch tool that finds and flags credit card information).  If neither of those options is feasible, technically minded shops might invest in creating regular expressions to parse data, and less technical ones can (as a last resort) attempt manual inspection of the data. 

Sensitive data discovery after a cloud migration

Keeping track of data is a great start, but it’s not always the case that information security (or even IT for that matter) is brought in ahead of time when a cloud service provider is engaged and brought online. Large amounts of data may already be hosted outside the organization boundary by the time security finds out about the project. But it’s not too late for an organization in this boat. There are a few activities that can be leveraged to locate sensitive data outside the organization. For example, application inventories are a required part of HIPAA and PCI DSS compliance activities and can be directly leveraged to find areas of regulated information at a service provider.

Business impact dependency chains created as part of the business impact assessment (BIA) phase of continuity planning can provide valuable data as well.  Within the security organization itself, data collected to create data flow diagrams (DFDs) used as part of threat modeling can be a data source. And for in-house developed systems that follow a formal development/design process, component interaction diagrams like Unified Modeling Language (UML) sequence or object interaction diagrams -- often created as part of development or deployment activities -- can be used as a “quick and dirty” map of data exchange, to the extent they include and represent key data elements.

Many security pros will tell you that having a complete and thorough inventory is a useful best practice.  However, it’s not always possible to accomplish.  So having a few backup strategies like these in your back pocket that don’t require significant funding beyond what you currently have deployed can help control the bleeding while you look for resources to get to that broader inventory. 

About the author:

Ed Moyle is a senior security strategist with Savvis as well as a founding partner of Security Curve.


Dig Deeper on Cloud Data Storage, Encryption and Data Protection Best Practices