Running MapReduce Polybase Queries

I have a post which digs into what happens when a Polybase query invokes a MapReduce job:

Stream 2 sends along 27 MB worth of data.  It’s packaging everything Polybase needs to perform operations, including JAR files and any internal conversion work that the Polybase engine might need to translate Hadoop results into SQL Server results.  For example, the sqlsort at the bottom is a DLL that performs ordering using SQL Server-style collations, as per the Polybase academic paper.

Stream 2 is responsible for almost every packet from 43 through 19811; only 479 packets in that range belonged to some other stream (19811 – 43 – 18864 – 425 = 479).  We send all of this data to the data node via port 50010.

If you love looking at Wireshark streams, you’ll love this post.

Related Posts

Cassandra To Kafka Connect

Mike Barlotta shows how to feed data into Kafka from Cassandra via Kafka Connect.  Part one involves basic setup: Modeling data in Cassandra must be done around the queries that are needed to access the data (see this article for details). Typically this means that there will be one table for each query and data (in our […]

Read More

Use Cases For Apache Kafka

Amy Boyle shows a few scenarios where New Relic uses Apache Kafka: The Events Pipeline team is responsible for plumbing some of New Relic’s core data streams-specifically, event data. These are fine-grained nuggets of monitoring data that record a single event at a particular moment in time. For example, an event could be an error thrown […]

Read More


November 2016
« Oct Dec »