Press "Enter" to skip to content

The Shuffling Operator And Azure SQL DW

Arun Sirpal is ready to deal:

For the purposes of this post the TSQL shown is elementary (don’t be surprised by that), the point is really about SHUFFLE. So, I select the estimated plan for the following code.

SELECT SOD.[SalesOrderID],SOD.[ProductID], SOH.[TotalDue]
FROM [SalesLT].[SalesOrderDetail] SOD
JOIN [SalesLT].[SalesOrderHeader] SOH ON
SOH.[SalesOrderID] = SOD.[SalesOrderID]
WHERE SOH.[TotalDue] > 1000

Shuffle me once, why not shuffle me twice. If you REALLY want to see the EXPLAIN command output, then it looks like this snippet below.

The DSQL operation clearly states SHUFFLE_MOVE. Why am I getting this? What does it mean?

Shuffling data isn’t the worst thing in the world, but it is a fairly expensive operation all things considered.  Ideally, your warehouse architecture limits the number of shuffle operations, but considering that you can only hash on one key, sometimes it’s inevitable.