Press "Enter" to skip to content

Using Koalas on Azure Databricks

Ginger Grant shows how you can install the koalas library on an Azure Databricks cluster:

Unfortunately if you are using an ML workspace, this will not work and you will get the error message org.apache.spark.SparkException: Library utilities are not available on Databricks Runtime for Machine Learning. The Koalas github documentation  says “In the future, we will package Koalas out-of-the-box in both the regular Databricks Runtime and Databricks Runtime for Machine Learning”.  What this means is if you want to use it now

Most of the time I want to install on the whole cluster as I segment libraries by cluster.  This way if I want those libraries I just connect to the cluster that has them. Now the easiest way to install a library is to open up a running Databricks cluster (start it if it is not running) then go to the Libraries tab at the top of the screen.

Click through for a demo of what you need to do.