Why Does Empirical Variance Use n-1 Instead Of n?

Sebastian Sauer gives us a simulation showing why we use n-1 instead of n as the denominator when calculating the variance of a sample:

Our results show that the variance of the sample is smaller than the empirical variance; however even the empirical variance too is a little too small compared with the population variance (which is 1). Note that sample size was n=10 in each draw of the simulation. With sample size increasing, both should get closer to the “real” (population) sample size (although the bias is negligible for the empirical variance). Let’s check that.

This is an R-heavy post and does a great job of showing that it’s necessary, and ends with  recommended reading if you want to understand the why.

Related Posts

Issues Starting ML Services

Jen Stirrup has a quick rundown of some reasons why Machine Learning Services might give you an error when you try to start it up: Msg 39023, Level 16, State 1, Procedure sp_execute_external_script, Line 1 [Batch Start Line 3] ‘sp_execute_external_script’ is disabled on this instance of SQL Server. Use sp_configure ‘external scripts enabled’ to enable […]

Read More

Using Have I Been Pwned In R

Maelle Salmon shows us how to use the HIBPwned library in R: The alternative title of this blog post is HIBPwned version 0.1.7 has been released! W00t!. Steph’s HIBPwned package utilises the HaveIBeenPwned.com API to check whether email addresses and/or user names have been present in any publicly disclosed data breach. In other words, this package potentially delivers bad news, but useful […]

Read More


March 2018
« Feb Apr »