With more organizations focusing on analytics and machine learning to develop deeper and more insightful business intelligence, there is a great deal of interest in leveraging big data technologies that can process many data types at massive scale. Unfortunately, building or purchasing these tools for on-premises use is cost prohibitive for many organizations, and so they're turning to the cloud for help. Many cloud providers are offering big data processing services today, ranging from large IaaS providers like AWS -- offering Elastic MapReduce and Kinesis services, among others -- to custom big data providers such as Qubole that only offer big data as a service. However, there have been many security concerns around big data strategies using Hadoop and other technologies, such as authentication and authorization issues, data protection and difficulty in auditing and monitoring data.
Recently, the Cloud Security Alliance published a new guide titled "Big Data Security and Privacy Handbook" that covers one hundred security-related best practices for organizations building cloud-based big data strategies today. A large variety of topics are covered, including how to ensure secure computations, protecting data in nonrelational data stores like NoSQL, logging and event management and monitoring, input validation and filtering, and cryptography and access control considerations for big data environments. The good news is that in many ways, the majority of the recommendations from the CSA working group underscore fundamental best practices that security professionals should be heeding regardless of the environment or technology, and they can significantly enhance their big data security in the cloud with a number of relatively simple approaches.
First, it's important to define users, add them to appropriate groups and assign reasonable privileges to these groups. This should sound familiar because it is the foundation of access control, and applies equally to Hadoop and other big data technologies. Protecting data in these environments is a major focal area, as well. Strong encryption should be used to protect user passwords -- hashing algorithms like SHA2 are supported in Hadoop and most relevant big data technologies -- and many deployment environments offer strong encryption like AES 256 for encryption of data at rest.
While some of the recommendations from the CSA are relatively straightforward and commonplace, others are very specific to big data strategies. Policy-based encryption, which combines authentication and authorization with encryption techniques, may be specific to certain big data technologies and helps to ensure only certain users or groups can access or use private keys within the environment.
A number of the control recommendations focus more on the surrounding environment than the actual cloud-based big data deployment. Some controls are listed for the connecting endpoints -- local certificates for authentication and endpoint antimalware are listed -- and infrastructure monitoring --logging SIEM, and more event management services and others are mentioned. Some categories and controls are definitely oriented toward cloud scenarios specifically, too. Separation of duties for cloud environments, homomorphic encryption for cloud-based big data environments, and data tagging that aligns with data tracking, integrity and ownership in the cloud are all discussed in the list of suggested controls.
While the paper lists a number of controls for big data strategies, it doesn't go into a deep level of detail on most of the recommendations, and many readers will likely need to look elsewhere for more detail in implementing cloud-based big data security controls. The guide recommends encryption in many areas, role definition and privileged user control, and integrity validation and monitoring, but many will be left wondering where to go next.
As more organizations are seeking to implement cloud-based big data strategies, the need to implement stringent security controls will only grow. Organizations moving in this direction will likely need all the help they can get.
Learn the must-know Azure terms before using Microsoft's big data services
Compare AWS, Microsoft Azure and Google's big data services
Find out how to develop big data consulting services