Getting Started With Azure Databricks

David Peter Hansen has a quick walkthrough of Azure Databricks:

RUN MACHINE LEARNING JOBS ON A SINGLE NODE

A Databricks cluster has one driver node and one or more worker nodes. The Databricks runtime includes common used Python libraries, such as scikit-learn. However, they do not distribute their algorithms.

Running a ML job only on the driver might not be what we are looking for. It is not distributed and we could as well run it on our computer or in a Data Science Virtual Machine. However, some machine learning tasks can still take advantage of distributed computation and it a good way to take an existing single-node workflow and transition it to a distributed workflow.

This great example notebooks that uses scikit-learn shows how this is done.

Read the whole thing.

Related Posts

From pandas to Spark with koalas

Achilleus tries out Koalas: Python is widely used programming language when it comes to Data science workloads and Python has way too many different libraries to back this fact. Most of the data scientists are familiar with Python and pandas mostly. But the main issue with Pandas is it works great for small and medium […]

Read More

Quick Hits on Managed Instance Backup / Restore

Jovan Popovic has some pieces of advice for backing up and restoring databases on Azure SQL Managed Instances: Managed Instance takes automatic backups (full backups every week, differential every 12 hours, and log backups every 5-10 min) that you can use to restore a database to some point of time in past within the retention […]

Read More

Categories

August 2018
MTWTFSS
« Jul Sep »
 12345
6789101112
13141516171819
20212223242526
2728293031