The Basics Of SparkR

Kevin Feasel


R, Spark

Yanbo Liang has an introductory article on what SparkR is and why you might want to use it:

However, data analysis using R is limited by the amount of memory available on a single machine and further as R is single threaded it is often impractical to use R on large datasets. To address R’s scalability issue, the Spark community developed SparkR package which is based on a distributed data frame that enables structured data processing with a syntax familiar to R users. Spark provides distributed processing engine, data source, off-memory data structures. R provides a dynamic environment, interactivity, packages, visualization. SparkR combines the advantages of both Spark and R.

In the following section, we will illustrate how to integrate SparkR with R to solve some typical data science problems from a traditional R users’ perspective.

This is a fairly introductory article, but gives an idea of what SparkR can accomplish.

Related Posts

Using DALEX To Explain Black-Box Models

Przemyslaw Biecek explains that there’s more than LIME for explaining black-box models: I’ve heard about a number of consulting companies, that decided to use simple linear model instead of a black box model with higher performance, because ,,client wants to understand factors that drive the prediction’’. And usually the discussion goes as following: ,,We have tried LIME […]

Read More

Comparing Keras In Python Versus R

Dmitry Kisler performs image classification using Keras in both Python and R: From the plots above, one can see that: the accuracy of your model doesn’t depend on the language you use to build and train it (the plot shows only train accuracy, but the model doesn’t have high variance and the bias accuracy is […]

Read More


April 2017
« Mar May »