Encryption In ElasticMapReduce

Sai Sriparasa shows how to enable encryption in an ElasticMapReduce cluster:

In this post, I go through the process of setting up the encryption of data at multiple levels using security configurations with EMR. Before I dive deep into encryption, here are the different phases where data needs to be encrypted.

Data at rest

  • Data residing on Amazon S3—S3 client-side encryption with EMR
  • Data residing on disk—the Amazon EC2 instance store volumes (except boot volumes) and the attached Amazon EBS volumes of cluster instances are encrypted using Linux Unified Key System (LUKS)

Data in transit

  • Data in transit from EMR to S3, or vice versa—S3 client side encryption with EMR

  • Data in transit between nodes in a cluster—in-transit encryption via Secure Sockets Layer (SSL) for MapReduce and Simple Authentication and Security Layer (SASL) for Spark shuffle encryption

  • Data being spilled to disk or cached during a shuffle phase—Spark shuffle encryption or LUKS encryption

Turns out this is rather straightforward.

Related Posts

Databricks Runtime 5.4

Todd Greenstein announces Databricks Runtime 5.4: We’ve partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks.   Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:– Simplifies manageability by using the same glue catalog across multiple Databricks workspaces.– Simplifies integrated […]

Read More

Building an AKS Cluster

Mohammad Darab continues a series on Big Data Clusters by creating a Kubernetes pod in Azure Kubernetes Service: Next, we will create a resource group by executing the following command:az group create –name nameOfMyresourceGroup –location eastus2 Once you execute the above command, you can go into the Azure portal and refresh your resource group pane […]

Read More

Categories

February 2017
MTWTFSS
« Jan Mar »
 12345
6789101112
13141516171819
20212223242526
2728