Challenge: In a distributed programming framework, such as MapReduce, separate yet parallel compute and storage functions process immense volumes of big data. Identifying malicious mappers -- and protecting the data untrusted mappers may be privy to -- is one of the top challenges of big data and can potentially hinder big data privacy efforts.
Solution: The CSA notes that there are two techniques available to ensure the trustworthiness of mappers: trust establishment and Mandatory Access Control (MAC).
During trust establishment, "workers" must be authenticated and given properties by "masters," and only when they're competent can they be assigned mapper tasks. After this qualification, periodic updates must be made to ensure mappers consistently meet established policies.
Alternately, MAC will help execute predefined security policies. However, while MAC ensures that the input of mappers is secure, it does not prevent data loss from mapper output. To avoid this, it is critical to leverage data de-identification techniques that will prevent the wrong information from being distributed.