Kevin Feasel



David Kun introduces the R Database Layer:

It is important to note that the SQL statements generated in the background are not executed unless explicitly requested by the command as.data.frame. Hence, you can merge, filter and aggregate your dataset on the database side and load only the result set into memory for R.

In general the design principle behind RDBL is to keep the models as close as possible to the usual data.frame logic, including (as shown later in detail) commands like aggregate, referencing columns by the \($\) operator and features like logical indexing using the \([]\) operator.

Check it out.  I’m not particularly excited about this for one simple reason:  SQL is a better data retrieval and connection DSL than an R-based mapper.  I get the value of sticking to one language as much as possible.  I also get that not all queries need to be well-optimized—for example, you might be running queries on a local machine or against a slice of data which is not connected to an operational production environment.  But I’m a big fan of using the right tool for the job, and the right tool for working with relational databases (and the “relational” part there is perhaps optional) is SQL.

Related Posts

Beware Multi-Assignment dplyr::mutate() Statements

John Mount hits on an issue when using dplyr backed by a database in R: Notice the above gives an incorrect result: all of the x_i columns are identical, and all of the y_i columns are identical. I am not saying the above code is in any way desirable (though something like it does arise naturally in certain test […]

Read More

The Theory Behind cdata

John Mount has a video explaining the concepts behind cdata: We also have two really nifty articles on the theory and methods: Fluid data reshaping with cdata Coordinatized Data: A Fluid Data Specification Please give it a try! Click through for the video, which I found very helpful in tying together a number of data […]

Read More


August 2016
« Jul Sep »