Suppose we had a large data set hosted on a
Spark
cluster that we wished to work with usingdplyr
andsparklyr
(for this article we will simulate such using data loaded intoSpark
from thenycflights13
package).We will work a trivial example: taking a quick peek at your data. The analyst should always be able to and willing to look at the data.
It is easy to look at the top of the data, or any specific set of rows of the data.
Read on for more details.