Interpreting P-Value Histograms

David Robinson visualizes and interprets different p-value histograms:

So you’re a scientist or data analyst, and you have a little experience interpreting p-values from statistical tests. But then you come across a case where you have hundreds, thousands, or even millions of p-values. Perhaps you ran a statistical test on each gene in an organism, or on demographics within each of hundreds of counties. You might have heard about the dangers of multiple hypothesis testing before. What’s the first thing you do?

Make a histogram of your p-values. Do this before you perform multiple hypothesis test correction, false discovery rate control, or any other means of interpreting your many p-values. Unfortunately, for some reason, this basic and simple task rarely gets recommended (for instance, the Wikipedia page on the multiple comparisons problem never once mentions this approach). This graph lets you get an immediate sense of how your test behaved across all your hypotheses, and immediately diagnose some potential problems. Here, I’ll walk you through a basic example of interpreting a p-value histogram.

It’s a fun read and informative as well.

Related Posts

Biases in Tree-Based Models

Nina Zumel looks at tree-based ensembling models like random forest and gradient boost and shows that they can be biased: In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data […]

Read More

R 3.6.1 Available

David Smith notes a new version of R is available: On July 5, the R Core Group released the source code for the latest update to R, R 3.6.1, and binaries are now available to download for Windows, Linux and Mac from your local CRAN mirror. R 3.6.1 is a minor update to R that fixes a few bugs. […]

Read More


November 2017
« Oct Dec »