In a similar spirit to how
sparklyrallowed us to reuse our functions from the
dplyrpackage to manipulate Spark DataFrames, the
RxSparkAPI allows a data scientist to develop code that can be deployed in a multitude of environments. This allows the developer to shift their focus from writing code that’s specific to a certain environment, and instead focus on the complex analysis of their data science problem. We call this flexibility Write Once, Deploy Anywhere, or WODA for the acronym lovers.
For a deeper dive into the
RevoScaleRpackage, I recommend you take a look at the online course, Analyzing Big Data with Microsoft R Server. Much of this blogpost follows along the last section of the course, on deployment to Spark.
R isn’t just for small, one-off jobs anymore.