With public cloud storage, traditional storage concepts, such as hard disks and RAID arrays, have been replaced...
by new, more flexible options. Data stored within a cloud platform is virtually independent of its underlying hardware implementation and it benefits from nearly limitless redundancy options, many of which are part of a default configuration.
Storage terminology has also changed. One of the new storage concepts that Amazon uses is that of a bucket as a container for storing data. It is easiest to see Amazon cloud buckets as incredibly flexible, highly accessible, distributed folders. These buckets can be hosted in a chosen geographic region, and elements such as logging and performance can be adjusted to match the requirements and budget of the customer.
Security considerations of cloud buckets
However, this flexibility does not come without risks. Many cloud users knowingly or unknowingly allow public access to the cloud buckets and their content. In some cases, this is caused by a misconfiguration; in other cases, it is caused by a lack of understanding of the relatively new technology. Whatever the underlying reasons are, unsecured buckets have already led to many data breaches and will likely continue to do so in the future.
An Amazon Simple Storage Service (S3) bucket access misconfiguration by web company LocalBlox, for instance, caused a major incident in February 2018. LocalBlox stored a 1.2 TB file containing 48 million records of users' internet behavior linked to their IP addresses inside a publicly accessible S3 bucket. As soon as the company was notified of the issue, it closed the access down. It is hard to know with certainty if anyone else downloaded a copy of the sensitive -- and valuable -- user data before the access lockdown, and, if so, where that copy could have ended up.
Once data has been publicly accessible for any length of time, it becomes nearly impossible to guarantee that the act of subsequently restricting the access has contained all the data. By then, the cat is out of the bag.
The security issue around cloud data is not new. Over the years, many tools -- such as S3 Scanner and AWSBucketDump -- have been developed to scan the cloud platform's address ranges looking for any publicly accessible cloud buckets. Once such a bucket is found, most of these tools can scan or dump the contents of the bucket, providing the interested party with an easy and automated way to access the exposed data.
The latest trend is to use certificate transparency logs for scanning efficiency. These tools no longer need to brute force all the entries on a predefined word list because they use permutations of domain names found in certificate transparency logs, which makes the process more targeted and, thus, quicker.
Vendors have now started to accept some of the responsibility of cloud bucket security issues, and some interesting, more proactive measures have been made available to cloud users recently.
Traditional security controls and processes still apply, but they often fail due to human error or a lack of understanding of the platform. Access rights need to be properly set and reviewed on a regular basis. Proactive scans using the mentioned enumeration scripts or broader vulnerability scans against the customer's own environment need to be performed and monitored. There's nothing new there; these policies simply need to be in place already.
One of the more interesting recent developments was the release of Amazon Macie in August 2017. Amazon Macie automatically discovers and classifies data stored inside Amazon S3 buckets using machine learning technology for natural language processing, and this might very well be the future.
It is clear that human error cannot be reduced to zero, so putting near-real-time automated controls in to contain the risks once such an error inevitably occurs is a good approach.
Another option is to enable Amazon's Default Encryption feature, which will automatically encrypt any file placed inside a bucket. Some other available features include Amazon's permission checks and alarms and the use of access control lists.
It is also critical to monitor public access and API calls. Alerts should be set and actioned to cover the dumping of large amounts of files or large files in general. A SIEM can assist in correlating the required security event data for these alerts via rules and set thresholds.
Data breaches through cloud storage are a problem that will not go away. There are many reasons why this topic is still such an issue, but there are mitigation options and there have been some promising developments in this space. Amazon has even been proactively contacting its users whose data is publicly accessible as well. When all these options are combined, they make up a solid, holistic security suite that should be more than sufficient to address the ongoing concerns -- but it is up to security professionals to implement and maintain them.