Encryption In ElasticMapReduce

Sai Sriparasa shows how to enable encryption in an ElasticMapReduce cluster:

In this post, I go through the process of setting up the encryption of data at multiple levels using security configurations with EMR. Before I dive deep into encryption, here are the different phases where data needs to be encrypted.

Data at rest

  • Data residing on Amazon S3—S3 client-side encryption with EMR
  • Data residing on disk—the Amazon EC2 instance store volumes (except boot volumes) and the attached Amazon EBS volumes of cluster instances are encrypted using Linux Unified Key System (LUKS)

Data in transit

  • Data in transit from EMR to S3, or vice versa—S3 client side encryption with EMR

  • Data in transit between nodes in a cluster—in-transit encryption via Secure Sockets Layer (SSL) for MapReduce and Simple Authentication and Security Layer (SASL) for Spark shuffle encryption

  • Data being spilled to disk or cached during a shuffle phase—Spark shuffle encryption or LUKS encryption

Turns out this is rather straightforward.

Related Posts

Azure Cost Savings Recommendations

Arun Sirpal shows where you can find cost savings recommendations for your Azure-based solutions: Nobody wants to waste money and being in the cloud is no exception! Luckily for us Azure is very efficient in tracking usage patterns and its associated costs, in this case, potential cost savings. You can find this information under Help […]

Read More

The Business Value Of Upgrading To Hadoop 3

Roni Fontaine, Vinod Vavilapalli, and Saumitra Buragohain explain some of the business case for upgrading to Hadoop 3 from Hadoop 2: Hadoop 2 doesn’t support GPUs. Hadoop 3 enables scheduling of additional resources, such as disks and GPUs for better integration with containers, deep learning & machine learning.  This feature provides the basis for supporting GPUs […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728