Graphing Row Counts With R

Kevin Feasel



I look at one use of R for DBAs:

I have a client data warehouse which holds daily rollups of revenue and cost for customers.  We’ve had some issues with the warehouse lately where data was not getting loaded due to system errors and timeouts, and our services team gave me a list of some customers who had gaps in their data due to persistent processing failures.  I figured out the root cause behind this (which will show up as tomorrow’s post), but I wanted to make sure that we filled in all of the gaps.

My obvious solution is to write a T-SQL query, getting some basic information by day for each customer.  I could scan through that result set, but the problem is that people aren’t great at reading tables of numbers; they do much better looking at pictures.  This is where R comes into play.

Click through for the code and a walkthrough of what each line is doing.

Related Posts

Testing Spatial Equilibrium Concepts With tidycensus

Ignacio Sarmiento Barbieri walks us through the concept of spatial equilibrium and tests using data from the tidycensus package: Let’s take the model to the data and reproduce figures 2.1. and 2.2 of “Cities, Agglomeration, and Spatial Equilibrium”. The focus are two cities, Chicago and Boston. These cities are chosen because both differ in how easy […]

Read More

Partitioning Data For Performance Improvement In R

John Mount shares a few examples of partitioning and parallelizing data operations in R: In this note we will show how to speed up work in R by partitioning data and process-level parallelization. We will show the technique with three different R packages: rqdatatable, data.table, and dplyr. The methods shown will also work with base-R and other packages. For each of the above […]

Read More


November 2016
« Oct Dec »