Getting Started With Azure Databricks

David Peter Hansen has a quick walkthrough of Azure Databricks:

RUN MACHINE LEARNING JOBS ON A SINGLE NODE

A Databricks cluster has one driver node and one or more worker nodes. The Databricks runtime includes common used Python libraries, such as scikit-learn. However, they do not distribute their algorithms.

Running a ML job only on the driver might not be what we are looking for. It is not distributed and we could as well run it on our computer or in a Data Science Virtual Machine. However, some machine learning tasks can still take advantage of distributed computation and it a good way to take an existing single-node workflow and transition it to a distributed workflow.

This great example notebooks that uses scikit-learn shows how this is done.

Read the whole thing.

Related Posts

Kafka 2.3 and Kafka Connect Improvements

Robin Moffatt goes over improvements in Kafka Connect with the release of Apache Kafka 2.3: A Kafka Connect cluster is made up of one or more worker processes, and the cluster distributes the work of connectors as tasks. When a connector or worker is added or removed, Kafka Connect will attempt to rebalance these tasks. Before version 2.3 of Kafka, […]

Read More

The Uniqueness of Cosmos DB Unique Keys

Hasan Savran explains the scope of unique keys in Cosmos DB: I wrote about Unique Keys and tried to explain how they work in one of my earlier post. It’s common to use SQL Server’s Primary Key or Unique Indexes to explain Unique Keys of Azure Cosmos DB. If you have a Primary Key in a […]

Read More

Categories

August 2018
MTWTFSS
« Jul Sep »
 12345
6789101112
13141516171819
20212223242526
2728293031