Taxi Rides

Kevin Feasel



Mark Litwintschik has an ongoing taxi ride data analysis series.  This time, he gives PostgreSQL a run:

For this workload the reporting speeds don’t line up well with the price differences between the RDS instances. I suspect this workload is biased towards R’s CPU consumption when generating PNGs rather than RDS’ performance when returning aggregate results. The RDS instances share the same number of IOPS each which might erase any other performance advantage they could have over one another.

As for the money spent importing the data into RDS I suspect scaling up is more helpful when you have a number of concurrent users rather than a single, large job to execute.

This is an interesting series Mark has going.

Related Posts

Testing Spatial Equilibrium Concepts With tidycensus

Ignacio Sarmiento Barbieri walks us through the concept of spatial equilibrium and tests using data from the tidycensus package: Let’s take the model to the data and reproduce figures 2.1. and 2.2 of “Cities, Agglomeration, and Spatial Equilibrium”. The focus are two cities, Chicago and Boston. These cities are chosen because both differ in how easy […]

Read More

Partitioning Data For Performance Improvement In R

John Mount shares a few examples of partitioning and parallelizing data operations in R: In this note we will show how to speed up work in R by partitioning data and process-level parallelization. We will show the technique with three different R packages: rqdatatable, data.table, and dplyr. The methods shown will also work with base-R and other packages. For each of the above […]

Read More


June 2016
« May Jul »