Polybase With Compression

I have a post looking at Polybase support for different compression formats:

This is a very interesting set of results.  First, 7Zip archived files do not work with the default encoding.  I’m not particularly surprised by this result, as 7Zip support is relatively scarce across the board and it’s a niche file format (though a very efficient format).

The next failure case is tar.  Tar is a weird case because it missed the first row in the file but was able to collect the remaining 776 records.  Same goes for .tar.gz.  I unpackaged the .tar file and the constituent SecondBasemen.csv file did in fact have all 777 records, so it’s something weird about the codec.

Stick to BZip2 and GZip if you’re using flat files.

Related Posts

Troubleshooting Polybase Installation Errors

John Paul Cook has an article looking at what to do when you have Oracle JRE 9 and want to use Polybase: This article shows you how to troubleshoot a failed installation of SQL Server and how to implement a workaround to allow SQL Server 2017’s PolyBase feature to be installed when version 9 of […]

Read More

Polybase And HDInsight

I have a post up on trying to integrate Polybase with HDInsight: But now we run into a problem:  there are certain ports which need to be open for Polybase to work.  This includes port 50010 on each of the data nodes against which we want to run MapReduce jobs.  This goes back to the issue […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930