replyr

Kevin Feasel

2017-03-08

Hadoop, R, Spark

John Mount shows off replyr, which is dplyr for remote, distributed data sets (think SparkR or sparklyr):

Suppose we had a large data set hosted on a Spark cluster that we wished to work with using dplyr and sparklyr (for this article we will simulate such using data loaded into Spark from the nycflights13 package).

We will work a trivial example: taking a quick peek at your data. The analyst should always be able to and willing to look at the data.

It is easy to look at the top of the data, or any specific set of rows of the data.

Read on for more details.

Related Posts

Polar Charts In Power BI With R

Leila Etaati shows how to build a polar chart in Power BI using an R component: I just add a layer to the above furmula “coord_polar()” this function also has been used for creating pie charts. it gets the “theta” variable, in below example I put theta=y axis, so we have below charts Normally I […]

Read More

Long-Term Storage In Kafka

Jay Kreps shows us that you can use Kafka as a primary data store: The short answer is that it’s not insane, people do this all the time, and Kafka was actually designed for this type of usage. But first, why might you want to do this? There are actually a number of use cases, […]

Read More

Categories

March 2017
MTWTFSS
« Feb Apr »
 12345
6789101112
13141516171819
20212223242526
2728293031