The Databricks File System

Brad Llewellyn takes us through the Azure Databricks File System:

Today, we’re going to talk about the Databricks File System (DBFS) in Azure Databricks.  If you haven’t read the previous posts in this series, IntroductionCluster Creation and Notebooks, they may provide some useful context.  You can find the files from this post in our GitHub Repository.  Let’s move on to the core of this post, DBFS.

As we mentioned in the previous post, there are three major concepts for us to understand about Azure Databricks, Clusters, Code and Data.  For this post, we’re going to talk about the storage layer underneath Azure Databricks, DBFS.  Since Azure Databricks manages Spark clusters, it requires an underlying Hadoop Distributed File System (HDFS).  This is exactly what DBFS is.  Basically, HDFS is the low cost, fault-tolerant, distributed file system that makes the entire Hadoop ecosystem work.  We may dig deeper into HDFS in a later post.  For now, you can read more about HDFS here and here.

Click through for more detail on DBFS.

Related Posts

Comparing Performance: HBase1 vs HBase2

Surbhi Kochhar takes us through performance improvements between HBase version 1 and HBase version 2: We are loading the YCSB dataset with 1000,000,000 records with each record 1KB in size, creating total 1TB of data. After loading, we wait for all compaction operations to finish before starting workload test. Each workload tested was run 3 […]

Read More

The Transaction Log in Delta Tables

Burak Yavuz, et al, explain how the transaction log works with Delta Tables in Apache Spark: When a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory. As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

August 2019
MTWTFSS
« Jul  
 1234
567891011
12131415161718
19202122232425
262728293031