Outlier Detection With dplyr And ruler

Evgeni Chasnovski shows how to use a couple R packages in concert to find outliers:

During the process of data analysis one of the most crucial steps is to identify and account for outliers, observations that have essentially different nature than most other observations. Their presence can lead to untrustworthy conclusions. The most complicated part of this task is to define a notion of “outlier”. After that, it is straightforward to identify them based on given data.

There are many techniques developed for outlier detection. Majority of them deal with numerical data. This post will describe the most basic ones with their application using dplyrand ruler packages.

After reading this post you will know:

  • Most basic outlier detection techniques.

  • A way to implement them using dplyr and ruler.

  • A way to combine their results in order to obtain a new outlier detection method.

  • A way to discover notion of “diamond quality” without prior knowledge of this topic (as a happy consequence of previous point).

Read the whole thing.  H/T R-Bloggers

Related Posts

The Lesser-Known Apply Functions In R

Andrew Treadway covers a few of the lesser-known apply functions in R: rapply Let’s start with rapply. This function has a couple of different purposes. One is to recursively apply a function to a list. We’ll get to that in a moment. The other use of rapply is to a apply a function to only those elements in […]

Read More

Bias Correction In Standard Deviation Estimates

John Mount explains how to perform bias correction and explains why it happens so rarely in practice: The bias in question is falling off at a rate of 1/n (where n is our sample size). So the bias issue loses what little gravity it ever may have ever had when working with big data. Most sources of noise will […]

Read More

Categories

December 2017
MTWTFSS
« Nov Jan »
 123
45678910
11121314151617
18192021222324
25262728293031