Press "Enter" to skip to content

Using sparklyr

Hossein Falaki and Xiangrui Meng show how to use sparklyr on a Databricks Spark cluster:

We collaborated with our friends at RStudio to enable sparklyr to seamlessly work in Databricks clusters. Starting with sparklyr version 0.6, there is a new connection method in sparklyr: databricks. When calling spark_connect(method = "databricks") in a Databricks R Notebook, sparklyr will connect to the spark cluster of that notebook. As this cluster is fully managed, you do not need to specify any other information such as version, SPARK_HOME, etc.

I’d lean toward sparklyr over SparkR because of the former’s tidyverse-centric view.