Polybase Execution Plan With Blob Storage

I look at an execution plan and packet capture of a Polybase query which reads from Azure Blob Storage:

In this case, all of those packets were 1514 bytes, so it’s an easy multiplication problem to see that we downloaded approximately 113 MB.  The 2008.csv.bz2 file itself is 108 MB, so factoring in TCP packet overhead and that there were additional, smaller packets in the stream, I think that’s enough to show that we did in fact download the entire file.  Just like in the Hadoop scenario without MapReduce, the Polybase engine needs to take all of the data and load it into a temp table (or set of temp tables if you’re using a Polybase scale-out cluster) before it can pull out the relevant rows based on our query.

The upshot is that Polybase behaves very similarly on Azure Blob Storage as it does with on-prem Hadoop for non-MapReduce queries.

Related Posts

Automating Azure SQL Database Scaling

Arun Sirpal shows how to use Azure Logic Apps to auto-scale Azure SQL Database: When I was presenting my Azure SQL Database session at DataRelay (used to be SQLRelay) I was asked (over coffee) about auto scaling capabilities. Quite simply there is nothing out of the box to achieve this. The idea of auto scaling […]

Read More

Deploying An Azure Container Within A Virtual Network

Andrew Pruski shows us that you can now deploy an Azure container running SQL Server within an Azure virtual network: Up until now Azure Container Instances only had one option to allow us to connect. That was assigning a public IP address that was directly exposed to the internet. Not really great as exposing SQL […]

Read More

Categories

December 2016
MTWTFSS
« Nov Jan »
 1234
567891011
12131415161718
19202122232425
262728293031