Azure SQL Data Warehouse Patterns

Murshed Zaman shows us a couple of patterns and anti-patterns for Azure SQL Data Warehouse:

Azure SQL DW is a Massively Parallel Processing (MPP) data warehousing service. It is a service because Microsoft maintains the infrastructure and software patching to make sure it’s always on up to date hardware and software on Azure. The service makes it easy for a customer to start loading their tables on day one and start running queries quickly and allows scaling of compute nodes when needed.

In an MPP database, table data is distributed among many servers (known as compute or slave nodes), and in many MPP systems shared-nothing storage subsystems are attached to those servers. Queries come through a head (or master) node where the location metadata for all the tables/data blocks resides. This head node knows how to deconstruct the query into smaller queries, introduce various data movement operations as needed, and pass smaller queries on to the compute nodes for parallel execution. Data movement is needed to align the data by the join keys from the original query. The topic of data movement in an MPP system is a whole another blog topic by itself, that we will tackle in a different blog. Besides Azure SQL DW, some other examples of a MPP data warehouses are Hadoop (Hive and Spark), Teradata, Amazon RedShift, Vertica, etc.

The opposite of MPP is SMP (Symmetric Multiprocessing) which basically means the traditional one server systems. Until the invention of MPP we had SMP systems. In database world the examples are traditional SQL Server, Oracle, MySQL etc. These SMP databases can also be used for both OLTP and OLAP purposes.

Murshed spends the majority of this blog post covering things you should not do, which is probably for the best.

Related Posts

Working With The Databricks API Via Powershell

Gerhard Brueckl has a Powershell module for interacting with Databricks, either Azure or AWS: As most of our deployments use PowerShell I wrote some cmdlets to easily work with the Databricks API in my scripts. These included managing clusters (create, start, stop, …), deploying content/notebooks, adding secrets, executing jobs/notebooks, etc. After some time I ended […]

Read More

Migrating A Database To Managed Instances

Frank Gill shows how to migrate a database from on-premises to an Azure SQL Managed Instance: If you have run through my last Managed Instance blog post, you have a Managed Instance at your disposal.  The PowerShell script for creating the network requirements also contains steps to create an Azure VM in a different subnet in […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930