An Introduction to Azure Databricks

Brad Llewellyn has an introduction to Azure Databricks:

So, what is Azure Databricks?  To answer this question, let’s start all the way at the bottom of the hole and climb up.  So, what is Hadoop?  Apache Hadoop is an open-source, distributed storage and computing ecosystem designed to handle incredibly large volumes of data and complex transformations.  It is becoming more common as organizations are starting to integrate massive data sources, such as social media, financial transactions and the Internet of Things.  However, Hadoop solutions are extremely complex to manage and develop.  So, many people have worked together to create platforms that layer on top of Hadoop to provide a simpler way to solve certain types of problems.  Apache Spark is one of these platforms.  You can read more about Apache Hadoop here and here.

It’s Hadoop turtles all the way down.

Related Posts

Databricks Runtime 5.4

Todd Greenstein announces Databricks Runtime 5.4: We’ve partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks.   Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:– Simplifies manageability by using the same glue catalog across multiple Databricks workspaces.– Simplifies integrated […]

Read More

Building an AKS Cluster

Mohammad Darab continues a series on Big Data Clusters by creating a Kubernetes pod in Azure Kubernetes Service: Next, we will create a resource group by executing the following command:az group create –name nameOfMyresourceGroup –location eastus2 Once you execute the above command, you can go into the Azure portal and refresh your resource group pane […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

June 2019
MTWTFSS
« May  
 12
3456789
10111213141516
17181920212223
24252627282930