Why Does Empirical Variance Use n-1 Instead Of n?

Sebastian Sauer gives us a simulation showing why we use n-1 instead of n as the denominator when calculating the variance of a sample:

Our results show that the variance of the sample is smaller than the empirical variance; however even the empirical variance too is a little too small compared with the population variance (which is 1). Note that sample size was n=10 in each draw of the simulation. With sample size increasing, both should get closer to the “real” (population) sample size (although the bias is negligible for the empirical variance). Let’s check that.

This is an R-heavy post and does a great job of showing that it’s necessary, and ends with  recommended reading if you want to understand the why.

Related Posts

Plotting ML Results In R

Bernardo Lares shows off the plots he creates in R to compare ML models: Split and compare quantiles This parameter is the easiest to sell to the C-level guys. “Did you know that with this model, if we chop the worst 20% of leads we would have avoided 60% of the frauds and only lose […]

Read More

Scatterplots For Multivariate Analysis

Neil Saunders declutters a complicated visual with a simple scatterplot: Sydney’s congestion at ‘tipping point’ blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes. Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, […]

Read More


March 2018
« Feb Apr »