Be Wary Of Colliders When Analyzing Data

Keith Goldfeld has an interesting demonstration of a collider variable and how it can lead us to incorrect conclusions during analysis:

In this (admittedly thoroughly made-up though not entirely implausible) network diagram, the test score outcome is a collider, influenced by a test preparation class and socio-economic status (SES). In particular, both the test prep course and high SES are related to the probability of having a high test score. One might expect an arrow of some sort to connect SES and the test prep class; in this case, participation in test prep is randomized so there is no causal link (and I am assuming that everyone randomized to the class actually takes it, a compliance issue I addressed in a series of posts starting with this one.)

The researcher who carried out the randomization had a hypothesis that test prep actually is detrimental to college success down the road, because it de-emphasizes deep thinking in favor of wrote memorization. In reality, it turns out that the course and subsequent college success are not related, indicated by an absence of a connection between the course and the long term outcome.

Read the whole thing.  H/T R-Bloggers

Related Posts

Plotting ML Results In R

Bernardo Lares shows off the plots he creates in R to compare ML models: Split and compare quantiles This parameter is the easiest to sell to the C-level guys. “Did you know that with this model, if we chop the worst 20% of leads we would have avoided 60% of the frauds and only lose […]

Read More

Scatterplots For Multivariate Analysis

Neil Saunders declutters a complicated visual with a simple scatterplot: Sydney’s congestion at ‘tipping point’ blares the headline and to illustrate, an interactive chart with bars for city population densities, points for commute times and of course, dual-axes. Yuck. OK, I guess it does show that Sydney is one of three cities that are low density, […]

Read More


March 2018
« Feb Apr »