Working with Tables in Databricks

Brad Llewellyn shows us how to build tables (temporary and permanent) and views in Azure Databricks using each of the main languages:

Simply put, an External Table is a table built directly on top of a folder within a data source.  This means that the data is not hidden away in some proprietary SQL format.  Instead, the data is completely accessible to outside systems in its native format.  The main reason for this is that it gives us the ability to create “live” queries on top of text data sources.  Every time a query is executed against the table, the query is run against the live data in the folder.  This means that we don’t have to run ETL jobs to load data into the table.  Instead, all we need to do is put the structured files in the folder and the queries will automatically surface the new data.

Each language has its own way of doing things, but they all use the Hive metastore under the covers.

Related Posts

Spark Streaming DStreams

Manish Mishra explains the fundamental abstraction of Spark Streaming: Before going into details of the operations available on the DStream API, let us look at the input sources from which we can start a Stream. There are multiple ways in which we can get the inputs from e.g. Kafka, Flume, etc. Or simple Idle files. […]

Read More

Multi-Region Replication with Confluent Platform

David Arthur walks us through multi-region replication of Kafka clusters in the Confluent Platform 5.4 preview: Running a single Apache Kafka® cluster across multiple datacenters (DCs) is a common, yet somewhat taboo architecture. This architecture, referred to as a stretch cluster, provides several operational benefits and unlocks the door to many uses cases. Stretch clusters provide […]

Read More

Categories

August 2019
MTWTFSS
« Jul Sep »
 1234
567891011
12131415161718
19202122232425
262728293031