Analyzing Taxi Data With Microsoft R Server

Kevin Feasel


R, Spark

Ali Zaidi builds a Spark cluster to analyze 1.1 billion taxi cab rides using Microsoft R Server:

In a similar spirit to how sparklyr allowed us to reuse our functions from the dplyr package to manipulate Spark DataFrames, the RxSpark API allows a data scientist to develop code that can be deployed in a multitude of environments. This allows the developer to shift their focus from writing code that’s specific to a certain environment, and instead focus on the complex analysis of their data science problem. We call this flexibility Write Once, Deploy Anywhere, or WODA for the acronym lovers.

For a deeper dive into the RevoScaleR package, I recommend you take a look at the online course, Analyzing Big Data with Microsoft R Server. Much of this blogpost follows along the last section of the course, on deployment to Spark.

R isn’t just for small, one-off jobs anymore.

Related Posts

Azure Databricks And Active Directory

Tristan Robinson wraps up a two-parter on Azure Databricks security: With the addition of Databricks runtime 5.1 which was released December 2018, comes the ability to use Azure AD credential pass-through. This is a huge step forward since there is no longer a need to control user permissions through Databricks Groups / Bash and then […]

Read More

Using Plotly In Power BI

Kara Annanie shows how you can R integration in Power BI to push Plotly visuals to users: In the example, above, we’ve created a line chart visualization using Plotly and we’ve decided to put labels on the graph, but only on the first and last points of the line graph. This graph would be particularly […]

Read More


December 2016
« Nov Jan »