Suppose we had a large data set hosted on a
Sparkcluster that we wished to work with usingdplyrandsparklyr(for this article we will simulate such using data loaded intoSparkfrom thenycflights13package).We will work a trivial example: taking a quick peek at your data. The analyst should always be able to and willing to look at the data.
It is easy to look at the top of the data, or any specific set of rows of the data.
Read on for more details.