Encryption In ElasticMapReduce

Sai Sriparasa shows how to enable encryption in an ElasticMapReduce cluster:

In this post, I go through the process of setting up the encryption of data at multiple levels using security configurations with EMR. Before I dive deep into encryption, here are the different phases where data needs to be encrypted.

Data at rest

  • Data residing on Amazon S3—S3 client-side encryption with EMR
  • Data residing on disk—the Amazon EC2 instance store volumes (except boot volumes) and the attached Amazon EBS volumes of cluster instances are encrypted using Linux Unified Key System (LUKS)

Data in transit

  • Data in transit from EMR to S3, or vice versa—S3 client side encryption with EMR

  • Data in transit between nodes in a cluster—in-transit encryption via Secure Sockets Layer (SSL) for MapReduce and Simple Authentication and Security Layer (SASL) for Spark shuffle encryption

  • Data being spilled to disk or cached during a shuffle phase—Spark shuffle encryption or LUKS encryption

Turns out this is rather straightforward.

Related Posts

Azure Data Lake Store Gen2

James Serra gives us the low-down on Azure Data Lake Store Gen2 now that it is generally available: When to use Blob vs ADLS Gen2New analytics projects should use ADLS Gen2, and current Blob storage should be converted to ADLS Gen2, unless these are non-analytical use cases that only need object storage rather than hierarchical storage […]

Read More

.Net Core On Docker Connecting Via AD To SQL Server

Michal Poreba shows us how to connect Windows Docker containers running .Net Core to SQL Server via Active Directory when the containers are not connected to the domain: The good news is that it is not an unreasonable requirement and it has been done before. The solution is to use Group Managed Service Accounts (gMSA) […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728