Joining Tables In SparkR

Kevin Feasel


R, Spark

WenSui Liu has a script to join tables together in SparkR:

showDF(merge(sum1, sum2, by.x = "month1", by.y = "month2", all = FALSE))
showDF(join(sum1, sum2, sum1$month1 == sum2$month2, "inner"))
#|     3|    -25|     3|    911|
#|     2|    -33|     2|    853|

There’s no commentary, so it’s all script all the time.  H/T R-bloggers

Related Posts

How .NET Code Talks to Spark

Ed Elliott has a great diagram showing how user-written .NET code communicates with Spark over the Java VM: 4. Spark-dotnet Java driver listens on tcp portThe spark-dotnet Java driver listens on a TCP socket. This socket is used to communicate between the Java VM and the dotnet code, the dotnet code doesn’t run in the […]

Read More

Biases in Tree-Based Models

Nina Zumel looks at tree-based ensembling models like random forest and gradient boost and shows that they can be biased: In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data […]

Read More


June 2017
« May Jul »