RUN MACHINE LEARNING JOBS ON A SINGLE NODE
A Databricks cluster has one driver node and one or more worker nodes. The Databricks runtime includes common used Python libraries, such as scikit-learn. However, they do not distribute their algorithms.
Running a ML job only on the driver might not be what we are looking for. It is not distributed and we could as well run it on our computer or in a Data Science Virtual Machine. However, some machine learning tasks can still take advantage of distributed computation and it a good way to take an existing single-node workflow and transition it to a distributed workflow.
This great example notebooks that uses scikit-learn shows how this is done.
Read the whole thing.