Split Query Processing In Polybase

David DeWitt, et al, describe the Polybase engine in an academic article:

When compiling a SQL query that references an external table stored in an HDFS file, the PDW Engine Service contacts the Hadoop Namenode for information about the file. This information, combined with the number of DMS instances in the PDW cluster, is used to calculate the portion (offset and length) of the input file(s) each DMS instance should read from HDFS. This information is passed to DMS in the HDFS Shuffle step of the DSQL (distributed SQL) plan along with other information needed to read the file, including the file’s path, the location of the appropriate Namenode, and the name of the RecordReader that the bridge should use.

The system attempts to evenly balance the number of bytes read by each DMS instance. Once the DMS instances obtain split information from the Namenode, each can independently read the portion of the file it is assigned, directly communicating with the appropriate Datanodes without any centralized control.

This is a very clear paper which helps describe the core constructs of Polybase.  Highly recommended.

Related Posts

PolyBase — SQL to SQL

I have a post covering PolyBase from SQL Server to SQL Server: Historically, PolyBase has three separate external entities: external data sources, external file formats, and external tables. External data sources tell SQL Server where the remote data is stored. External file formats tell SQL Server what the shape of that data looks like—in other […]

Read More

DW Databases in PolyBase

I look at some databases people tend to ignore: Today is a fairly short post covering a trio of databases you might not even know you have: DWConfiguration, DWDiagnostics, and DWQueue. The PolyBase installer drops all three of these on your instance. Let’s go in ascending order of the number of useful tables. There are […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031