Scaling Azure Data Warehouse

Kevin Feasel

2016-08-01

Cloud

Vincent-Philippe Lauzon looks at how Azure Data Warehouse scales:

Which data gets stored in which database?

As long as you are doing simple select on one table and that your data is distributed evenly, you shouldn’t care, right?  The query will flow to the compute nodes, they will perform the query on each database and the result will be merged together by the control node.

But once you start joining data from multiple tables, ADW will have to swing data around from one database to another in order to join the data.  This is called Data Movement.  It is impossible to avoid in general but you should strive to minimize it to obtain better performance.

This is a look primarily at the underlying mechanics rather than testing a particular load.  Check it out.

Related Posts

What’s Happing In Azure Data Factory Right Now?

Melissa Coates has a couple Powershell scripts to figure out which pipelines are currently running in Azure Data Factory v1: This is a quick post to share a few scripts to find what is currently executing in Azure Data Factory. These PowerShell scripts are applicable to ADF version 1 (not version 2 which uses different cmdlets). Prerequisite: […]

Read More

Setting Up SparklyR In Azure

David Smith shows how you can spin up a Spark cluster in Azure and install SparklyR on top of it: The SparklyR package from RStudio provides a high-level interface to Spark from R. This means you can create R objects that point to data frames stored in the Spark cluster and apply some familiar R paradigms (like dplyr) […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031