Polybase DMVs

I look at the DMVs associated with Polybase and external table creation:

Let’s walk through this one step at a time and understand what the DMV is telling us.  Unfortunately, the DMV documentation is a little sparse, so some of this is guesswork on my part.

  1. A RandomIDOperation appears to create a temporary table.  In this case, the table (whose name is randomly generated) is named TEMP_ID_53.  I’m not sure where that name comes from; the session I ran this from was 54, so it wasn’t a session ID.

  2. After the table gets created, each Compute node gets told to create a table called TMP_ID_53 in tempdb whose structure matches our external table’s structure.  One thing you can’t see from the screenshot is that this table is created with DATA_COMPRESSION = PAGE.  I have to wonder if that’d be the same if my Compute node were on Standard edition.

  3. We add an extended property on the table, flagging it as IS_EXTERNAL_STREAMING_TABLE.

  4. We then update the statistics on that temp table based on expected values.  629 rows are expected here.

  5. Then, we create the dest stat, meaning that the temp table now has exactly the same statistics as our external table.

  6. The next step is that the Head node begins a MultiStreamOperation, which tells the Compute nodes to begin working.  This operator does not show up in the documentation, but we can see that the elapsed time is 58.8 seconds, which is just about as long as my query took.  My guess is that this is where the Head node passes code to the Compute nodes and tells them what to do.

  7. We have a HadoopRoundRobinOperation on DMS, which stands for “Data Movement Step” according to the location_type documentation.  What’s interesting is that according to the DMV, that operation is still going.  Even after I checked it 40 minutes later, it still claimed to be running.  If you check the full query, it’s basically a SELECT * from our external table.

  8. Next is a StreamingReturnOperation, which includes our predicate WHERE dest = ‘ORD’ in it.  This is a Data Movement Step and includes all of the Compute nodes (all one of them, that is) sending data back to the Head node so that I can see the results.

  9. Finally, we drop TEMP_ID_53 because we’re done with the table.

This post was about 70% legwork and 30% guesswork.  That’s a bit higher a percentage than I’d ideally like, but there isn’t that much information readily available yet, so I’m trying (in my own small way) to fix that.

Related Posts

Scripting Maintenance Mode Tasks

Jamie Wick shares some hard-earned knowledge regarding scripting out maintenance tasks using Powershell: Given that we have several hundred servers (and growing), this process is taking an increasing amount of time each month. Over the years we’ve implemented various automated patching systems (WSUS, IBM BigFix, etc.) and they’ve worked reasonably well for managing the Download & […]

Read More

Azure Without ARM

Ed Elliott gives us a few ways of deploying Azure resources without using ARM templates: So, what are our options? Create/Edit/Delete ourselves using Powershell/.Net/Python/Go/Java/Some Other SDK Process something else (YAML?) into JSON Generate the ARM using c#/Powershell/something else 3rd party tools, (Terraform is the big daddy) / others include Sparkle Formation To be honest, I’d […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930