Moving From reshape2 To tidyr

Kevin Feasel

2017-12-19

R

Martin Johnsson talks about a couple tricky bits when moving from reshape2 to tidyr:

In practice, I don’t think people always take their data frames all the way to tidy. For example, to make a scatterplot, it is convenient to keep a couple of variables as different columns. The key is that we need to move between different forms rapidly (brain time-rapidly, more than computer time-rapidly, I might add).

And not everything should be organized this way. If you’re a geneticist, genotypes are notoriously inconvenient in normalized form. Better keep that individual by marker matrix.

The first serious piece of R code I wrote for someone else was a function to turn data into long form for plotting. I suspect plotting is often the gateway to tidy data. The function was like what you’d expect from R code written by a beginner who comes from C-style languages: It reinvented the wheel, and I bet it had nested for loops, a bunch of hard bracket indices, and so on. Then I discovered reshape2.

I’d not used reshape2 before, having started with tidyr, so it was interesting to see the contrast.

Related Posts

Using Cohen’s D for Experiments

Nina Zumel takes us through Cohen’s D, a useful tool for determining effect sizes in experiments: Cohen’s d is a measure of effect size for the difference of two means that takes the variance of the population into account. It’s defined asd = | μ1 – μ2 | / σpooledwhere σpooled is the pooled standard deviation over both cohorts. […]

Read More

Comparing Iterator Performance in R

Ulrik Stervbo has a performance comparison for for, apply, and map functions in R: It is usually said, that for– and while-loops should be avoided in R. I was curious about just how the different alternatives compare in terms of speed. The first loop is perhaps the worst I can think of – the return vector is […]

Read More

Categories

December 2017
MTWTFSS
« Nov Jan »
 123
45678910
11121314151617
18192021222324
25262728293031