Joining Tables In SparkR

Kevin Feasel

2017-06-13

R, Spark

WenSui Liu has a script to join tables together in SparkR:

# INNER JOIN
showDF(merge(sum1, sum2, by.x = "month1", by.y = "month2", all = FALSE))
showDF(join(sum1, sum2, sum1$month1 == sum2$month2, "inner"))
#+------+-------+------+-------+
#|month1|min_dep|month2|max_dep|
#+------+-------+------+-------+
#|     3|    -25|     3|    911|
#|     2|    -33|     2|    853|
#+------+-------+------+-------+

There’s no commentary, so it’s all script all the time.  H/T R-bloggers

Related Posts

R Services Internal Communication Mechanisms

Niels Berglund continues his R Services internals series: When browsing for the symbols, you can use this command: x /1 *!TCP*. By using the option /1 you’ll only see the names, and no addresses. On my machine that gives me quite a lot, but there are two entries that catch my eye: sqllang!Tcp::AcceptConnection and sqllang!Tcp::Close. So let us set breakpoints at […]

Read More

Compacting Shared Libraries In R

Dirk Eddelbuettel compacts the tidyverse: Of course, there is a third way: just run strip --strip-debug over all the shared libraries after the build. As the path is standardized, and the shell does proper globbing, we can just do $ strip --strip-debug /usr/local/lib/R/site-library/*/libs/*.so using a double-wildcard to get all packages (in that R package directory) and all their shared […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930