Simpson’s Paradox Explained

Mehdi Daoudi, et al, have a nice explanation of Simpson’s Paradox:

E.H. Simpson first described the phenomenon of Simpson’s paradox in 1951. The actual name “Simpson’s paradox” was introduced by Colin R. Blyth in 1972. Blyth mentioned that:

G.W. Haggstrom pointed out that Simpson’s paradox is the simplest form of the false correlation paradox in which the domain of x is divided into short intervals, on each of which y is a linear function of x with large negative slope, but these short line segments get progressively higher to the right, so that over the whole domain of x, the variable y is practically a linear function of x with large positive slope.

The authors also provide a helpful example with operational metrics, showing how aggregating the data leads to an opposite (and invalid) conclusion.

Related Posts

Classifying Texts With Naive Bayes

I continue my series on Naive Bayes with another hand-calculation post: Step two is, on the surface, pretty tough: how do we figure out if a set of words is a business phrase or a baseball phrase? We could try to think up a set of features. For example, how long is the phrase? How many unique […]

Read More

Where Machine Learning And Econometrics Collide

Dave Giles shares some thoughts on how machine learning and econometrics relate: What is Machine Learning (ML), and how does it differ from Statistics (and hence, implicitly, from Econometrics)? Those are big questions, but I think that they’re ones that econometricians should be thinking about. And if I were starting out in Econometrics today, I’d […]

Read More


August 2017
« Jul Sep »