Hossein Falaki and Xiangrui Meng show how to use sparklyr on a Databricks Spark cluster:
We collaborated with our friends at RStudio to enable sparklyr to seamlessly work in Databricks clusters. Starting with sparklyr version 0.6, there is a new connection method in sparklyr:
databricks
. When callingspark_connect(method = "databricks")
in a Databricks R Notebook, sparklyr will connect to the spark cluster of that notebook. As this cluster is fully managed, you do not need to specify any other information such as version, SPARK_HOME, etc.
I’d lean toward sparklyr over SparkR because of the former’s tidyverse-centric view.