Using sparklyr

Kevin Feasel


R, Spark

Hossein Falaki and Xiangrui Meng show how to use sparklyr on a Databricks Spark cluster:

We collaborated with our friends at RStudio to enable sparklyr to seamlessly work in Databricks clusters. Starting with sparklyr version 0.6, there is a new connection method in sparklyr: databricks. When calling spark_connect(method = "databricks") in a Databricks R Notebook, sparklyr will connect to the spark cluster of that notebook. As this cluster is fully managed, you do not need to specify any other information such as version, SPARK_HOME, etc.

I’d lean toward sparklyr over SparkR because of the former’s tidyverse-centric view.

Related Posts

Error Handling In Scala

Manish Mishra gives a few examples of how to handle errors in Scala: Try[T] is another construct to capture the success or a failure scenarios. It returns a value in both cases. Put any expression in Try and it will return Success[T] if the expression is successfully evaluated and will return Failure[T] in the other case […]

Read More

An Introduction To seplyr

John Mount guest blogs on the Revolutions blog about seplyr: seplyr is an R package that supplies improved standard evaluation interfaces for many common data wrangling tasks. The core of seplyr is a re-skinning of dplyr‘s functionality to seplyr conventions (similar to how stringr re-skins the implementing package stringi). Read on for a couple of examples of where seplyr can make it easier for you to […]

Read More