Sparklyr

Kevin Feasel

2016-10-04

R, Spark

RStudio has announced an interface between R and Apache Spark, named sparklyr:

Over the past couple of years we’ve heard time and time again that people want a native dplyr interface to Spark, so we built one! sparklyr also provides interfaces to Spark’s distributed machine learning algorithms and much more. Highlights include:

  • Interactively manipulate Spark data using both dplyr and SQL (via DBI).

  • Filter and aggregate Spark datasets then bring them into R for analysis and visualization.

  • Orchestrate distributed machine learning from R using either Spark MLlib or H2O SparkingWater.

  • Create extensions that call the full Spark API and provide interfaces to Spark packages.

  • Integrated support for establishing Spark connections and browsing Spark DataFrames within the RStudio IDE.

So what’s the difference between sparklyr and SparkR?

This might be the package I’ve been awaiting.

Related Posts

Building TensorFlow Neural Networks On Spark With Keras

Jules Damji has an example of using the PyCharm IDE to use Keras to build TensorFlow neural network models on the Databricks MLflow library: Our example in the video is a simple Keras network, modified from Keras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and […]

Read More

Scatterplots For Multivariate Analysis

Neil Saunders declutters a complicated visual with a simple scatterplot: Sydney’s congestion at ‘tipping point’ blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes. Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, […]

Read More

Categories

October 2016
MTWTFSS
« Sep Nov »
 12
3456789
10111213141516
17181920212223
24252627282930
31