Securing S3 Credentials In Spark Jobs

Jason Pohl shows how to protect credentials for connecting to Amazon Web Services S3 buckets when building Spark jobs:

Since Apache Spark separates compute from storage, every Spark Job requires a set of credentials to connect to disparate data sources. Storing those credentials in the clear can be a security risk if not stringently administered. To mitigate that risk, Databricks makes it easy and secure to connect to S3 with either Access Keys via DBFS or by using IAM Roles. For all other data sources (Kafka, Cassandra, RDBMS, etc.), the sensitive credentials must be managed by some other means.

This blog post will describe how to leverage an IAM Role to map to any set of credentials. It will leverage the AWS’s Key Management Service (KMS) to encrypt and decrypt the credentials so that your credentials are never in the clear at rest or in flight. When a Databricks Cluster is created using the IAM Role, it will have privileges to both read the encrypted credentials from an S3 bucket and decrypt the ciphertext with a KMS key.

That’s only one data source, but an important one.

Related Posts

Tuning Apache Spark Applications

Vidisha Gupta has a few tips for tuning Apache Spark programs: Data Serialization – Serialization plays an important role in increasing the performance of any application. Spark provides two serialization libraries – Java Serialization: By default, spark uses Java’s ObjectOutputStream framework which can work with any class that implements java.io.serializable. This serialization is flexible but slow and […]

Read More

Kerberos Authentication In Apache Cassandra

Justin Cameron announces an open source Kerberos authenticator in Apache Cassandra: In conjunction with the Cassandra authenticator, we have also published an open-source Kerberos authenticator plugin for the Cassandra Java driver. The plugin supports multiple Kerberos quality of protection (QOP) levels, which may be specified directly when configuring the authenticator. The driver’s QOP level must match the […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930