Ten Notes On SparkR

Kevin Feasel


R, Spark

Neil Dewar has a notebook with ten important things when migrating from R to SparkR:

  1. Apache Spark Building Blocks. A high-level overview of Spark describes what is available for the R user.

  2. SparkContext, SQLContext, and SparkSession. In Spark 1.x, SparkContext and SQLContext let you access Spark. In Spark 2.x, SparkSession becomes the primary method.

  3. A DataFrame or a data.frame? Spark’s distributed DataFrame is different from R’s local data.frame. Knowing the differences lets you avoid simple mistakes.

  4. Distributed Processing 101. Understanding the mechanics of Big Data processing helps you write efficient code—and not blow up your cluster’s master node.

  5. Function Masking. Like all R libraries, SparkR masks some functions.

  6. Specifying Rows. With Big Data and Spark, you generally select rows in DataFrames differently than in local R data.frames.

  7. Sampling. Sample data in the right way, and use it as a tool for converting between big and small data.

  8. Machine Learning. SparkR has a growing library of distributed ML algorithms.

  9. Visualization.It can be hard to visualize big data, but there are tricks and tools which help.

  10. Understanding Error Messages. For R users, Spark error messages can be daunting. Knowing how to parse them helps you find the relevant parts.

I highly recommend checking out the notebook.

Related Posts

ggplot2 Geoms And Aesthetics

Tyler Rinker digs into ggplot2’s geoms and aesthetics: I thought it my be fun to use the geoms aesthetics to see if we could cluster aesthetically similar geoms closer together. The heatmap below uses cosine similarity and heirarchical clustering to reorder the matrix that will allow for like geoms to be found closer to one […]

Read More

Legible Function Chaining In R

John Mount shows a few techniques for legible function chaining with R: The dot intermediate convention is very succinct, and we can use it with base R transforms to get a correct (and performant) result. Like all conventions: it is just a matter of teaching, learning, and repetition to make this seem natural, familiar and legible. My […]

Read More


January 2017
« Dec Feb »