Evgeni Chasnovski shows how to use a couple R packages in concert to find outliers:
During the process of data analysis one of the most crucial steps is to identify and account for outliers, observations that have essentially different nature than most other observations. Their presence can lead to untrustworthy conclusions. The most complicated part of this task is to define a notion of “outlier”. After that, it is straightforward to identify them based on given data.
There are many techniques developed for outlier detection. Majority of them deal with numerical data. This post will describe the most basic ones with their application using dplyrand ruler packages.
After reading this post you will know:
-
Most basic outlier detection techniques.
-
A way to implement them using
dplyr
andruler
. -
A way to combine their results in order to obtain a new outlier detection method.
-
A way to discover notion of “diamond quality” without prior knowledge of this topic (as a happy consequence of previous point).
Read the whole thing. H/T R-Bloggers
Comments closed