Standard Deviation Estimation

Kevin Feasel

2016-06-23

R

Dan Goldstein gives a rule of thumb for getting standard deviations for various distributions:

Say you’ve got 30 numbers and a strong urge to estimate their standard deviation. But you’ve left your computer at home. Unless you’re really good at mentally squaring and summing, it’s pretty hard to compute a standard deviation in your head. But there’s a heuristic you can use:

Subtract the smallest number from the largest number and divide by four

Let’s call it the “range over four” heuristic. You could, and probably should, be skeptical. You could want to see how accurate the heuristic is. And you could want to see how the heuristic’s accuracy depends on the distribution of numbers you are dealing with.

Sometimes you just don’t have STDEV() available.

Related Posts

ElasticMapReduce And RStudio

Tanzir Musabbir demonstrates how to set up Amazon ElasticMapReduce to include an RStudio edge node: RStudio Server provides a browser-based interface for R and a popular tool among data scientists. Data scientist use Apache Spark cluster running on  Amazon EMR to perform distributed training. In a previous blog post, the author showed how you can install RStudio Server on Amazon […]

Read More

Mutating Data Frames Without dplyr

John Mount points out that there is a built-in function to mutate data frames in R: The notation we used above is the “explicit argument” variation we recommend for readability. What a lot of dplyr users do not seem to know is: base-R already has this functionality. The function is called transform(). To demonstrate this, let’s first detach dplyr to show that […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930