Scaling Azure Data Warehouse

Kevin Feasel

2016-08-01

Cloud

Vincent-Philippe Lauzon looks at how Azure Data Warehouse scales:

Which data gets stored in which database?

As long as you are doing simple select on one table and that your data is distributed evenly, you shouldn’t care, right?  The query will flow to the compute nodes, they will perform the query on each database and the result will be merged together by the control node.

But once you start joining data from multiple tables, ADW will have to swing data around from one database to another in order to join the data.  This is called Data Movement.  It is impossible to avoid in general but you should strive to minimize it to obtain better performance.

This is a look primarily at the underlying mechanics rather than testing a particular load.  Check it out.

Related Posts

Using AU Analyzer To Lower Data Lake Analytics Costs

Matthew Hicks shows off the Data Lake Analytics AU Analyzer: The AU Analyzer looks at all the vertices (or nodes) in your job, analyzes how long they ran and their dependencies, then models how long the job might run if a certain number of vertices could run at the same time. Each vertex may have […]

Read More

Creating Azure SQL Database Managed Instances Via ARM Templates

Jovan Popovic shows how us how to build a Managed Instance of Azure SQL Database using Powershell and an ARM template: Values that you need to change in this request are: name – name of your Azure SQL Managed Instance (don’t include domain). properties/administratorLogin – SQL login that will be used to connect to the […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031