Moving From reshape2 To tidyr

Kevin Feasel



Martin Johnsson talks about a couple tricky bits when moving from reshape2 to tidyr:

In practice, I don’t think people always take their data frames all the way to tidy. For example, to make a scatterplot, it is convenient to keep a couple of variables as different columns. The key is that we need to move between different forms rapidly (brain time-rapidly, more than computer time-rapidly, I might add).

And not everything should be organized this way. If you’re a geneticist, genotypes are notoriously inconvenient in normalized form. Better keep that individual by marker matrix.

The first serious piece of R code I wrote for someone else was a function to turn data into long form for plotting. I suspect plotting is often the gateway to tidy data. The function was like what you’d expect from R code written by a beginner who comes from C-style languages: It reinvented the wheel, and I bet it had nested for loops, a bunch of hard bracket indices, and so on. Then I discovered reshape2.

I’d not used reshape2 before, having started with tidyr, so it was interesting to see the contrast.

Related Posts

Beware Multi-Assignment dplyr::mutate() Statements

John Mount hits on an issue when using dplyr backed by a database in R: Notice the above gives an incorrect result: all of the x_i columns are identical, and all of the y_i columns are identical. I am not saying the above code is in any way desirable (though something like it does arise naturally in certain test […]

Read More

The Theory Behind cdata

John Mount has a video explaining the concepts behind cdata: We also have two really nifty articles on the theory and methods: Fluid data reshaping with cdata Coordinatized Data: A Fluid Data Specification Please give it a try! Click through for the video, which I found very helpful in tying together a number of data […]

Read More


December 2017
« Nov Jan »