Polybase And Azure SQL Data Warehouse

I have a post on using Polybase with Azure SQL Data Warehouse:

That’s a header row, and I’m okay with it not making its way in.  As a quick aside, I should note that I picked tailnum as my distribution key.  The airplane’s tail number is unique to that craft, so there absolutely will be more than 60 distinct values, and as I recall, this data set didn’t have too many NULL values.  After loading the 2008 data, I loaded all years’ data the same way, except selecting from dbo.Flights instead of Flights2008.

Click through for more details, including the CETAS statement, which I’d love to see in on-prem SQL Server.

Related Posts

Big Data Clusters In SQL Server 2019

James Serra lays out some of the architecture behind SQL Server 2019 Big Data Clusters: While extract, transform, load (ETL) has its use cases, an alternative to ETL is data virtualization, which integrates data from disparate sources, locations, and formats, without replicating or moving the data, to create a single “virtual” data layer.  The virtual […]

Read More

112 Million Cab Rides In Azure SQL Data Warehouse

Derik Hammer wants a real test of Azure SQL Data Warehouse: The method that I liked the most and finally settled on was to use a public dataset. I wanted data which was skewed in real ways and did not require a lot of work to massage. Microsoft has a great listing of public datasets here. […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031