Data Manipulation In R

Kevin Feasel



Casimir Saternos has an article on matrix operations and other data transformations in R:

Operations that are conceptually simple can be difficult to perform using SQL.  Consider the common requirements to pivot or transpose a dataset.   Each of these actions are conceptually straightforward but are complex to implement using SQL.  The examples that follow are somewhat verbose, but the details are not significant. The main point is to illustrate is that, by using specialized functions outside of SQL,  R makes trivial some of those operations that would otherwise require complex SQL statements.  The contrast in the amount of code required is striking.  The simpler approach allows you to focus attention on the scientific or business problem at hand, rather than expending energy reading documentation or laboriously testing complex statements.

I consider this where the second-order value of R comes in.  The initial “wow” factor is in how easy you can plot things, and this ease of data cleansing is the next big time-saver.

Related Posts

Defining Tidy Data

John Mount shares thoughts about the concept of tidy data: A question is: is such a data set “tidy”? The paper itself claims the above definitions are “Codd’s 3rd normal form.” So, no the above table is not “tidy” under that paper’s definition. The the winner’s date of birth is a fact about the winner […]

Read More

Visualizing Earthquake Data

Giorgio Garziano continues a series on analyzing earthquake data: This is the third part of our post series about the exploratory analysis of a publicly available dataset reporting earthquakes and similar events within a specific 30 days time span. In this post, we are going to show static, interactive and animated earthquakes maps of different flavors by […]

Read More


January 2016
« Dec Feb »