Polybase With Compression

I have a post looking at Polybase support for different compression formats:

This is a very interesting set of results.  First, 7Zip archived files do not work with the default encoding.  I’m not particularly surprised by this result, as 7Zip support is relatively scarce across the board and it’s a niche file format (though a very efficient format).

The next failure case is tar.  Tar is a weird case because it missed the first row in the file but was able to collect the remaining 776 records.  Same goes for .tar.gz.  I unpackaged the .tar file and the constituent SecondBasemen.csv file did in fact have all 777 records, so it’s something weird about the codec.

Stick to BZip2 and GZip if you’re using flat files.

Related Posts

Polybase Rejected Row Location

Casey Karst announces a nice improvement to Polybase on Azure SQL Data Warehouse: Every row of your data is an insight waiting to be found. That is why it is critical you can get every row loaded into your data warehouse. When the data is clean, loading data into Azure SQL Data Warehouse is easy […]

Read More

ggplot2 Scales And Coordinates

I continue my series on ggplot2: The other thing I want to cover today is coordinate systems.  The ggplot2 documentation shows seven coordinate functions.  There are good reasons to use each, but I’m only going to demonstrate one.  By default, we use the Cartesian coordinate system and ggplot2 sets the viewing space.  This viewing space covers the […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930