112 Million Cab Rides In Azure SQL Data Warehouse

Kevin Feasel

2018-10-23

Cloud, Data

Derik Hammer wants a real test of Azure SQL Data Warehouse:

The method that I liked the most and finally settled on was to use a public dataset. I wanted data which was skewed in real ways and did not require a lot of work to massage. Microsoft has a great listing of public datasets here.

I decided to go with the NYC Taxi and Limousine Commission (TLC) Trip Record Data. Data is available for most taxi and limousine fares with pickup/drop-off and distance information between January 2009 and June 2018. This includes data for Yellow cab, Green cab, and for hire vehicles. Just the Yellow cab data from 01/2016 – 06/2018 is over 112,000,000 records (24 GBs) and they download into easy to import comma separated values (CSV) files.

Read on to see how you can set it up yourself.  As Derik points out at the end, though, this is still one big table, but there are a few columns which can lead to dimensions, things like rate code, location, and payment type.

Related Posts

Backing Up Database to Azure Blob Storage

Jamie Wick shows us how we can back up database directly to Azure Blob Storage: Azure storage, as a backup destination for SQL backups, is a great option for organizations that are contemplating replacing older on-prem NAS appliances or improve their Disaster Recovery functionality. The tiered storage pricing, along with local and global redundancy options, […]

Read More

Scaling Power BI Premium Capacity

Matt Allington gives us instructions on how to scale Power BI Premium capacity: This is the third article in my series about how to make Power BI Premium more affordable for small to medium sized enterprises (SMEs).  In my first article I explained the problem and the logic behind how to configure a workable solution. […]

Read More

Categories

October 2018
MTWTFSS
« Sep Nov »
1234567
891011121314
15161718192021
22232425262728
293031