Getting Distinct Rows In R

Kevin Feasel

2017-09-01

R

Rob J. Hyndman shows four different techniques (one “classic” and three tidyverse) for getting a distinct subset of a data set in R:

So that looks much better — clean, short, and easy to understand. But is it fast? Rather than grabbing the first lines of each group, it has to go searching for duplicates. But avoiding grouping and ungrouping must save some time.

So I ran some microbenchmark timings:

Click through for techniques and timings.  I’m not surprised that the “classic” method won out in terms of time, but for explanatory value, I’d definitely prefer trying to explain the tidyverse distinct version.  H/T R-Bloggers

Related Posts

Timing R Function Calls

Colin Gillespie shows off an R package for benchmarking: Of course, it’s more likely that you’ll want to compare more than two things. You can compare as many function calls as you want with mark(), as we’ll demonstrate in the following example. It’s probably more likely that you’ll want to compare these function calls against more […]

Read More

Exploratory Data Analysis with inspectdf

Laura Ellis continues a dive into Exploratory Data Analysis, this time using the inspectdf package: I like this package because it’s got a lot of functionality and it’s incredibly straightforward to use. In short, it allows you to understand and visualize column types, sizes, values, value imbalance & distributions as well as correlations. Better yet, […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930