Checkpointing Code For Reproduction

Kevin Feasel

2017-08-16

R

David Smith tells an interesting story about a reproducibility problem with data analysis:

Timo Grossenbacher, data journalist with Swiss Radio and TV in Zurich, had a bit of a surprise when he attempted to recreate the results of one of the R Markdown scripts published by SRF Data to accompany their data journalism story about vested interests of Swiss members of parliament. Upon re-running the analysis in R last week, Timo was surprised when the results differed from those published in August 2015. There was no change to the R scripts or data in the intervening two-year period, so what caused the results to be different?

The version of R Timo was using had been updated, but that wasn’t the root cause of the problem. What had also changed was the version of the dplyr package used by the script: version 0.5.0 now, versus version 0.4.2 then. For some unknown reason, a change in the dplyr package in the intervening package caused some data rows (shown in red above) to be deleted during the data preparation process, and so the results changed.

Click through for the solution, which is pretty easy in R.

Related Posts

The Theory Behind cdata

John Mount has a video explaining the concepts behind cdata: We also have two really nifty articles on the theory and methods: Fluid data reshaping with cdata Coordinatized Data: A Fluid Data Specification Please give it a try! Click through for the video, which I found very helpful in tying together a number of data […]

Read More

Microsoft R Open 3.4.3

David Smith announces Microsoft R Open 3.4.3: Microsoft R Open (MRO), Microsoft’s enhanced distribution of open source R, has been upgraded to version 3.4.3 and is now available for download for Windows, Mac, and Linux. This update upgrades the R language engine to the latest R (version 3.4.3) and updates the bundled packages (specifically: checkpoint, curl, doParallel, foreach, and iterators) to new versions. MRO is 100% compatible with […]

Read More

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031