Polybase With Compression

I have a post looking at Polybase support for different compression formats:

This is a very interesting set of results.  First, 7Zip archived files do not work with the default encoding.  I’m not particularly surprised by this result, as 7Zip support is relatively scarce across the board and it’s a niche file format (though a very efficient format).

The next failure case is tar.  Tar is a weird case because it missed the first row in the file but was able to collect the remaining 776 records.  Same goes for .tar.gz.  I unpackaged the .tar file and the constituent SecondBasemen.csv file did in fact have all 777 records, so it’s something weird about the codec.

Stick to BZip2 and GZip if you’re using flat files.

Related Posts

MRAppMaster Errors Running MapReduce Jobs

I have a post looking at potential causes when PolyBase MapReduce jobs are unable to find the MRAppMaster class: Let me tell you about one of my least favorite things I like to see in PolyBase: Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster This error is not limited to PolyBase but is instead […]

Read More

PolyBase and Hive Shim Errors

I ran into a problem with Hive 3 and PolyBase: My initial plan was to google things. The specific error: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number. That pops up HIVE-15326 and HIVE-15016 but gave me no immediate joy. After reaching out to James Rowland-Jones (t), we (by which I mean he) eventually figured out the issue. Click through […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930