Getting Distinct Rows In R

Kevin Feasel

2017-09-01

R

Rob J. Hyndman shows four different techniques (one “classic” and three tidyverse) for getting a distinct subset of a data set in R:

So that looks much better — clean, short, and easy to understand. But is it fast? Rather than grabbing the first lines of each group, it has to go searching for duplicates. But avoiding grouping and ungrouping must save some time.

So I ran some microbenchmark timings:

Click through for techniques and timings.  I’m not surprised that the “classic” method won out in terms of time, but for explanatory value, I’d definitely prefer trying to explain the tidyverse distinct version.  H/T R-Bloggers

Related Posts

Using Service Broker To Queue Up External Script Calls

Arvind Shyamsundar shows how to use Service Broker to run external R or Python scripts based on new data coming into a transactional system: Here, we will show you how you can use the asynchronous execution mechanism offered by SQL Server Service Broker to ‘queue’ up data inside SQL Server which can then be asynchronously passed to […]

Read More

Reasons For Using Docker With R

Jeroen Ooms gives us a few reasons why we might want to containerize our R-based products: The flagship of the OpenCPU system is the OpenCPU server: a mature and powerful Linux stack for embedding R in systems and applications. Because OpenCPU is completely open source we can build and ship on DockerHub. A ready-to-go linux server […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930