Using sparklyr

Kevin Feasel

2017-05-31

R, Spark

Hossein Falaki and Xiangrui Meng show how to use sparklyr on a Databricks Spark cluster:

We collaborated with our friends at RStudio to enable sparklyr to seamlessly work in Databricks clusters. Starting with sparklyr version 0.6, there is a new connection method in sparklyr: databricks. When calling spark_connect(method = "databricks") in a Databricks R Notebook, sparklyr will connect to the spark cluster of that notebook. As this cluster is fully managed, you do not need to specify any other information such as version, SPARK_HOME, etc.

I’d lean toward sparklyr over SparkR because of the former’s tidyverse-centric view.

Related Posts

Kafka Offset Management With Spark Streaming

Guru Medasana and Jordan Hambleton explain how to perform Kafka offset management when using Spark Streaming: Enabling Spark Streaming’s checkpoint is the simplest method for storing offsets, as it is readily available within Spark’s framework. Streaming checkpoints are purposely designed to save the state of the application, in our case to HDFS, so that it […]

Read More

Spark And H2O

Avkash Chauhan shows how to use sparklyr and rsparkling to tie Spark together with the H2O library in R: In order to work with Spark H2O using rsparkling and sparklyr in R, you must first ensure that you have both sparklyr and rsparkling installed. Once you’ve done that, you can check out the working script, the […]

Read More

Categories