Checkpointing Code For Reproduction

Kevin Feasel

2017-08-16

R

David Smith tells an interesting story about a reproducibility problem with data analysis:

Timo Grossenbacher, data journalist with Swiss Radio and TV in Zurich, had a bit of a surprise when he attempted to recreate the results of one of the R Markdown scripts published by SRF Data to accompany their data journalism story about vested interests of Swiss members of parliament. Upon re-running the analysis in R last week, Timo was surprised when the results differed from those published in August 2015. There was no change to the R scripts or data in the intervening two-year period, so what caused the results to be different?

The version of R Timo was using had been updated, but that wasn’t the root cause of the problem. What had also changed was the version of the dplyr package used by the script: version 0.5.0 now, versus version 0.4.2 then. For some unknown reason, a change in the dplyr package in the intervening package caused some data rows (shown in red above) to be deleted during the data preparation process, and so the results changed.

Click through for the solution, which is pretty easy in R.

Related Posts

Biases in Tree-Based Models

Nina Zumel looks at tree-based ensembling models like random forest and gradient boost and shows that they can be biased: In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data […]

Read More

R 3.6.1 Available

David Smith notes a new version of R is available: On July 5, the R Core Group released the source code for the latest update to R, R 3.6.1, and binaries are now available to download for Windows, Linux and Mac from your local CRAN mirror. R 3.6.1 is a minor update to R that fixes a few bugs. […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031