Securing S3 Credentials In Spark Jobs

Jason Pohl shows how to protect credentials for connecting to Amazon Web Services S3 buckets when building Spark jobs:

Since Apache Spark separates compute from storage, every Spark Job requires a set of credentials to connect to disparate data sources. Storing those credentials in the clear can be a security risk if not stringently administered. To mitigate that risk, Databricks makes it easy and secure to connect to S3 with either Access Keys via DBFS or by using IAM Roles. For all other data sources (Kafka, Cassandra, RDBMS, etc.), the sensitive credentials must be managed by some other means.

This blog post will describe how to leverage an IAM Role to map to any set of credentials. It will leverage the AWS’s Key Management Service (KMS) to encrypt and decrypt the credentials so that your credentials are never in the clear at rest or in flight. When a Databricks Cluster is created using the IAM Role, it will have privileges to both read the encrypted credentials from an S3 bucket and decrypt the ciphertext with a KMS key.

That’s only one data source, but an important one.

Related Posts

Active Directory Integration On Ubuntu

Drew Furgiuele shows how to configure SQL Server on Linux to use Windows authentication: In the following post, we’ll walk through joining a Linux SQL Server on Unbuntu to an Active Directory domain, and here’s the steps we’re going to take: Installing the required software and services to enable a Linux host to talk to […]

Read More

Homoglyphs And SQL Injection

Bert Wagner gives us one more reason why blacklists are bad: Homoglpyhs can exist within a character set (like the Latin character set examples above) or they can exist between character sets. For example, you may have the unicode apostrophe ʼ, which is a homoglyph to the Latin single quote character '. How does SQL Server handle unicode […]

Read More


June 2017
« May Jul »