Securing S3 Credentials In Spark Jobs

Jason Pohl shows how to protect credentials for connecting to Amazon Web Services S3 buckets when building Spark jobs:

Since Apache Spark separates compute from storage, every Spark Job requires a set of credentials to connect to disparate data sources. Storing those credentials in the clear can be a security risk if not stringently administered. To mitigate that risk, Databricks makes it easy and secure to connect to S3 with either Access Keys via DBFS or by using IAM Roles. For all other data sources (Kafka, Cassandra, RDBMS, etc.), the sensitive credentials must be managed by some other means.

This blog post will describe how to leverage an IAM Role to map to any set of credentials. It will leverage the AWS’s Key Management Service (KMS) to encrypt and decrypt the credentials so that your credentials are never in the clear at rest or in flight. When a Databricks Cluster is created using the IAM Role, it will have privileges to both read the encrypted credentials from an S3 bucket and decrypt the ciphertext with a KMS key.

That’s only one data source, but an important one.

Related Posts

Whitelisting SQL CLR Assemblies

Niels Berglund walks through the process of whitelisting a CLR assembly in SQL Server 2017: What Microsoft introduces in SQL Server 2017 RC1, is something I refer to as whitelisting. It is somewhat similar to the TRUSTWORTHY setting, where you indicate that a database is to be trusted. But instead of doing it on the database level, you […]

Read More

Certificate Copying

Brian Carrig shows how to create certificates from binary: Sometimes it is necessary to copy a certificate from one database to another database. The most common method I have seen to do this is involves taking a backup of the certificate to disk from one database and then restoring the certificate to the other database. […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930