Simpson’s Paradox Explained

Mehdi Daoudi, et al, have a nice explanation of Simpson’s Paradox:

E.H. Simpson first described the phenomenon of Simpson’s paradox in 1951. The actual name “Simpson’s paradox” was introduced by Colin R. Blyth in 1972. Blyth mentioned that:

G.W. Haggstrom pointed out that Simpson’s paradox is the simplest form of the false correlation paradox in which the domain of x is divided into short intervals, on each of which y is a linear function of x with large negative slope, but these short line segments get progressively higher to the right, so that over the whole domain of x, the variable y is practically a linear function of x with large positive slope.

The authors also provide a helpful example with operational metrics, showing how aggregating the data leads to an opposite (and invalid) conclusion.

Related Posts

Markov Chains In Python

Sandipan Dey shows off various uses of Markov chains as well as how to create one in Python: Perspective. In the 1948 landmark paper A Mathematical Theory of Communication, Claude Shannon founded the field of information theory and revolutionized the telecommunications industry, laying the groundwork for today’s Information Age. In this paper, Shannon proposed using a Markov chain to […]

Read More

More DBA Salary Research

Ginger Grant digs into the DBA salary survey a bit further: I know that I have heard that if you want to make money you need to get into management. Being a good manager is not the same skill set as being a good database professional, and there are many people who do not want to […]

Read More


August 2017
« Jul Sep »