replyr

Kevin Feasel

2017-03-08

Hadoop, R, Spark

John Mount shows off replyr, which is dplyr for remote, distributed data sets (think SparkR or sparklyr):

Suppose we had a large data set hosted on a Spark cluster that we wished to work with using dplyr and sparklyr (for this article we will simulate such using data loaded into Spark from the nycflights13 package).

We will work a trivial example: taking a quick peek at your data. The analyst should always be able to and willing to look at the data.

It is easy to look at the top of the data, or any specific set of rows of the data.

Read on for more details.

Related Posts

Plotting ML Results In R

Bernardo Lares shows off the plots he creates in R to compare ML models: Split and compare quantiles This parameter is the easiest to sell to the C-level guys. “Did you know that with this model, if we chop the worst 20% of leads we would have avoided 60% of the frauds and only lose […]

Read More

Building TensorFlow Neural Networks On Spark With Keras

Jules Damji has an example of using the PyCharm IDE to use Keras to build TensorFlow neural network models on the Databricks MLflow library: Our example in the video is a simple Keras network, modified from Keras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and […]

Read More

Categories

March 2017
MTWTFSS
« Feb Apr »
 12345
6789101112
13141516171819
20212223242526
2728293031