Polybase Use Cases

James Serra talks about Polybase use cases:

For federated queries: “N” requires all data from the source to be copied into SQL Server 2016 and then filtered.  For “Y”, the query is pushed down into the data source and only the results are returned back, which can be much faster for large amounts of data.

I mention “Maybe” for age out data in SQL DW as you can use PolyBase to access the aged-out data in blob or Azure Data Lake Storage (ADLS), but it will have to import all the data so may have slower performance (which is usually ok for accessing data that is aged-out).  For SQL Server 2016, it will have to import the data unless you use HDP/Cloudera, in which case the creation of the MapReduce job will add overhead.

The thing that I like about this chart is that the new Polybase sources (SQL Server, Oracle, Teradata, Mongo, and generic ODBC) do support predicate pushdown.  For large data sets, that’s huge:  it lets the database engine on the opposite end do as much filtering as possible before sending results back to your SQL Server head node.

Related Posts

PolyBase and Hive Shim Errors

I ran into a problem with Hive 3 and PolyBase: My initial plan was to google things. The specific error: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number. That pops up HIVE-15326 and HIVE-15016 but gave me no immediate joy. After reaching out to James Rowland-Jones (t), we (by which I mean he) eventually figured out the issue. Click through […]

Read More

PolyBase and Pushdown Limitations

I have a post covering something I learned about predicate pushdown against Hadoop in PolyBase: Before I start, let’s talk about predicate pushdown for a moment. The gist of it is that when you have data in two sources, you have two options for combining the data: 1. Bring the data in its entirety from […]

Read More

Categories

March 2017
MTWTFSS
« Feb Apr »
 12345
6789101112
13141516171819
20212223242526
2728293031