Databricks Runtime 5.4

Todd Greenstein announces Databricks Runtime 5.4:

We’ve partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks.   Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:
– Simplifies manageability by using the same glue catalog across multiple Databricks workspaces.
– Simplifies integrated security by using IAM Role Passthrough for metadata in Glue.
– Provides easier access to metadata across the Amazon stack and access to data catalogued in Glue.

There are some interesting changes in here.

Related Posts

RDDs, DataFrames, and Datasets in Spark

Brad Llewellyn walks us through the three key data structures in Apache Spark: We see that creating an RDD can be done with one easy function.  In this snippet, sc represents the default SparkContext.  This is extremely important, but is better left for a later post.  SparkContext offers the .textFile() function which creates an RDD from […]

Read More

Azure Data Factory and Schema Drift

Mark Kromer walks us through two techniques we can use in Azure Data Factory to deal with schema drift: Azure Data Factory’s Mapping Data Flows have built-in capabilities to handle complex ETL scenarios that include the ability to handle flexible schemas and changing source data. We call this capability “schema drift“. When you build transformations […]

Read More

Categories

June 2019
MTWTFSS
« May Jul »
 12
3456789
10111213141516
17181920212223
24252627282930