SQL Data Warehouse Distribution Keys

Simon Whiteley explains the different distribution key options available in Azure SQL Data Warehouse and SQL Server APS:

Each record that is inserted goes onto the next available distribution. This guarantees that you will have a smooth, even distribution of data, but it means you have no way of telling which data is on which distribution. This isn’t always a problem!

If I wanted to perform a count of records, grouped by a particular field, I can perform this on a round-robin table. Each distribution will run the query in parallel and return it’s grouped results. The results can be simply added together as a second part of the query, and adding together 60 smaller datasets shouldn’t be a large overhead. For this kind of single-table aggregation, round-robin distribution is perfectly adequate!

However, the issues arise when we have multiple tables in our query. In order to join two tables. Let’s take a very simple join between a fact table and a dimension. I’ve shown 6 distributions for simplicity, but this would be happening across all 60.

Figuring out which distribution key to use can make a huge difference in performance.

Related Posts

Reducing Reads In Queries

Bert Wagner has a few tips for improving query performance by reducing the number of reads: If SQL Server thinks it only is going to read 1 row of data, but instead needs to read way more rows of data, it might choose a poor execution plan which results in more reads. You might get a suboptimal […]

Read More

Creating An Azure Chat Bot

Dustin Ryan shows how to build a QnA bot: After you’ve created your knowledge base you can then edit and update your knowledge base. There’s a few different ways to update your knowledge. a. Manually edit the knowledge base directly within QnAMaker.ai. You can do this by directly editing the questions by modifying the text […]

Read More