Ali Zaidi builds a Spark cluster to analyze 1.1 billion taxi cab rides using Microsoft R Server:
In a similar spirit to how
sparklyr
allowed us to reuse our functions from thedplyr
package to manipulate Spark DataFrames, theRxSpark
API allows a data scientist to develop code that can be deployed in a multitude of environments. This allows the developer to shift their focus from writing code that’s specific to a certain environment, and instead focus on the complex analysis of their data science problem. We call this flexibility Write Once, Deploy Anywhere, or WODA for the acronym lovers.For a deeper dive into the
RevoScaleR
package, I recommend you take a look at the online course, Analyzing Big Data with Microsoft R Server. Much of this blogpost follows along the last section of the course, on deployment to Spark.
R isn’t just for small, one-off jobs anymore.