Reshaping Data With R

Kevin Feasel



Alberto Giudici compares tidyr to reshape2 for data cleansing in R:

We see a different behaviour: gather() has brought messy into a long data format with a warning by treating all columns as variable, while melt() has treated trt as an “id variables”. Id columns are the columns that contain the identifier of the observation that is represented as a row in our data set. Indeed, if melt() does not receive any id.variables specification, then it will use the factor or character columns as id variables. gather() requires the columns that needs to be treated as ids, all the other columns are going to be used as key-value pairs.

Despite those last different results, we have seen that the two functions can be used to perform the exactly same operations on data frames, and only on data frames! Indeed, gather() cannot handle matrices or arrays, while melt() can as shown below.

It seems that these two tools have some overlap, but each has its own point of focus:  tidyr is simpler for data tidying, whereas reshape2 has functionality (like data aggregation) which tidyr does not include.

Related Posts

Inline Operators In R With wrapr

John Mount shows how to use inline operators in R with the wrapr package: The above code is assuming you have the wrapr package attached via already having run library('wrapr'). Notice we picked R-related operator names. We stayed away from overloading the + operator, as the arithmetic operators are somewhat special in how they dispatch in R. The goal wasn’t […]

Read More

Feature And Text Classification Using Naive Bayes In R

I wrap up my series on the Naive Bayes class of algorithms, finally writing some code along the way: Now we’re going to look at movie reviews and predict whether a movie review is a positive or a negative review based on its words. If you want to play along at home, grab the data set, […]

Read More


June 2016
« May Jul »