Taxi Rides

Kevin Feasel

2016-06-24

R

Mark Litwintschik has an ongoing taxi ride data analysis series.  This time, he gives PostgreSQL a run:

For this workload the reporting speeds don’t line up well with the price differences between the RDS instances. I suspect this workload is biased towards R’s CPU consumption when generating PNGs rather than RDS’ performance when returning aggregate results. The RDS instances share the same number of IOPS each which might erase any other performance advantage they could have over one another.

As for the money spent importing the data into RDS I suspect scaling up is more helpful when you have a number of concurrent users rather than a single, large job to execute.

This is an interesting series Mark has going.

Related Posts

Housing Prices In Ames, Iowa: A Kaggle Competition

Kathryn Bryant and M. Aaron Owen share their Kaggle experiences.  First, Kathryn, et al: The lifecycle of our project was a typical one. We started with data cleaning and basic exploratory data analysis, then proceeded to feature engineering, individual model training, and ensembling/stacking. Of course, the process in practice was not quite so linear and […]

Read More

Data Wrangling At Scale

John Mount has a short article showing off the cdata package: Suppose we needed to un-pivot this data into a row oriented representation. Often big data transform steps can achieve a much higher degree of parallelization with “tall data”. With the cdata package this transform is easy and performant, as we show below. Read the whole thing.

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930