Polybase Execution Plan With Blob Storage

I look at an execution plan and packet capture of a Polybase query which reads from Azure Blob Storage:

In this case, all of those packets were 1514 bytes, so it’s an easy multiplication problem to see that we downloaded approximately 113 MB.  The 2008.csv.bz2 file itself is 108 MB, so factoring in TCP packet overhead and that there were additional, smaller packets in the stream, I think that’s enough to show that we did in fact download the entire file.  Just like in the Hadoop scenario without MapReduce, the Polybase engine needs to take all of the data and load it into a temp table (or set of temp tables if you’re using a Polybase scale-out cluster) before it can pull out the relevant rows based on our query.

The upshot is that Polybase behaves very similarly on Azure Blob Storage as it does with on-prem Hadoop for non-MapReduce queries.

Related Posts

Calculated Columns and Memory Usage in Analysis Services

Teo Lachev troubleshoots a customer issue: Scenario: A client reports a memory spike during processing. They have a Tabular semantic model deployed to Azure Analysis Services. They fully process the model daily. The model normally takes less than 50 GB RAM but during processing, it spikes five times and Azure Analysis Services terminates the processing task […]

Read More

The Uniqueness of Cosmos DB Unique Keys

Hasan Savran explains the scope of unique keys in Cosmos DB: I wrote about Unique Keys and tried to explain how they work in one of my earlier post. It’s common to use SQL Server’s Primary Key or Unique Indexes to explain Unique Keys of Azure Cosmos DB. If you have a Primary Key in a […]

Read More

Categories

December 2016
MTWTFSS
« Nov Jan »
 1234
567891011
12131415161718
19202122232425
262728293031