Databricks Cluster-Scoped Init Scripts

Aayush Bhasin shares some background on a Databricks intern project, adding cluster-scoped initialization scripts to Databricks clusters:

One of the biggest pain points for customers used to be that init scripts for a cluster were not part of the cluster configuration and did not show up in the User Interface. Because of this, applying init scripts to a cluster was unintuitive, and editing or cloning a cluster would not preserve the init script configuration. Cluster-scoped init scripts addressed this issue by including an ‘Init Scripts’ panel in the UI of the cluster configuration page, and adding an ‘init_scripts’ field to the public API. This also allows init scripts to take advantage of cluster access control.

Read on to see how Aayush & co. solved this issue.

Related Posts

Working With Images In Spark 2.4

Tomas Nykodym and Weichen Xu give us an update on working with images in the most recent version of Apache Spark: An image data source addresses many of these problems by providing the standard representation you can code against and abstracts from the details of a particular image representation.Apache Spark 2.3 provided the ImageSchema.readImages API (see Microsoft’s post […]

Read More

Comparing Streaming Engines

George Vetticaden compares Spark Streaming, Storm, and Kafka Streams: Before the addition of Kafka Streams support, HDP and HDF supported two stream processing engines:  Spark Structured Streaming and Streaming Analytics Manager (SAM) with Storm. So naturally, this begets the following question:Why add a third stream processing engine to the platform?With the choice of using Spark […]

Read More

Categories

September 2018
MTWTFSS
« Aug Oct »
 12
3456789
10111213141516
17181920212223
24252627282930