Getting Distinct Rows In R

Kevin Feasel



Rob J. Hyndman shows four different techniques (one “classic” and three tidyverse) for getting a distinct subset of a data set in R:

So that looks much better — clean, short, and easy to understand. But is it fast? Rather than grabbing the first lines of each group, it has to go searching for duplicates. But avoiding grouping and ungrouping must save some time.

So I ran some microbenchmark timings:

Click through for techniques and timings.  I’m not surprised that the “classic” method won out in terms of time, but for explanatory value, I’d definitely prefer trying to explain the tidyverse distinct version.  H/T R-Bloggers

Related Posts

Faster User-Defined Functions In SparkR

Liang Zhang and Hossein Falaki note a major performance improvement for functions in SparkR using the latest version of the Databricks Runtime: SparkR offers four APIs that run a user-defined function in R to a SparkDataFrame dapply() dapplyCollect() gapply() gapplyCollect() dapply() allows you to run an R function on each partition of the SparkDataFrame and returns […]

Read More

Subsetting Matrices In R

Dave Mason continues his look at matrices in R: We can extract an entire row from a matrix. To do this, specify the desired row only within the square brackets [ ]. The placeholder where you would otherwise specify the column is left empty. > #Points scored by Kendrick Perkins. > points_scored_by_quarter[1,] 1st 2nd 3rd 4th […]

Read More


September 2017
« Aug Oct »