SQL Data Warehouse Distribution Keys

Simon Whiteley explains the different distribution key options available in Azure SQL Data Warehouse and SQL Server APS:

Each record that is inserted goes onto the next available distribution. This guarantees that you will have a smooth, even distribution of data, but it means you have no way of telling which data is on which distribution. This isn’t always a problem!

If I wanted to perform a count of records, grouped by a particular field, I can perform this on a round-robin table. Each distribution will run the query in parallel and return it’s grouped results. The results can be simply added together as a second part of the query, and adding together 60 smaller datasets shouldn’t be a large overhead. For this kind of single-table aggregation, round-robin distribution is perfectly adequate!

However, the issues arise when we have multiple tables in our query. In order to join two tables. Let’s take a very simple join between a fact table and a dimension. I’ve shown 6 distributions for simplicity, but this would be happening across all 60.

Figuring out which distribution key to use can make a huge difference in performance.

Related Posts

Nested Loops, Hash, Or Merge: Which Is Best?

Grant Fritchey dodges the important questions: First response, also a joke, was the question at the title of this post: What is the preferred operator when joining tables: Hash Match, Nested Loops or Merge? While my immediate response to this question is, yes. Meaning, they’re all preferred, situationally. I decided to expand on that a […]

Read More

New Features In Public Preview On Azure SQL Database

Microsoft has a round of announcements for public previews on Azure SQL Database.  First up is Kevin Farlee announcing approximate count distinct: The new APPROX_COUNT_DISTINCT aggregate function returns the approximate number of unique non-null values in a group. This function is designed for use in big data scenarios and is optimized for the following conditions: Access of […]

Read More

Categories