Press "Enter" to skip to content

Category: Warehousing

Azure SQL Data Warehouse Setup

Arun Sirpal configures a new instance of Azure SQL Data Warehouse:

The information shown here is the DSQL (Distributed SQL) plan – When you send a SQL query to SQL Data Warehouse, the Control node processes a query and converts the code to DSQL then the Control node sends the command to run in each of the compute nodes.

The returned query plan depicts sequential SQL statements; when the query runs it may involve parallelized operations, so some of the sequential statements shown may run at the same time. More information can be found at the following URL https://msdn.microsoft.com/en-us/library/mt631615.aspx.

Arun also looks at running a simple Power BI report off of Azure SQL Data Warehouse; click through for that.

Comments closed

Azure SQL Data Warehouse Architecture

Warner Chaves looks at system views in Azure SQL Data Warehouse:

Unlike the sys.dm_exec_requests view in SQL Server, the sys.dm_pdw_exec_requests view actually keeps up to 10000 records with the information of a request even after it has executed. This capability is very useful as you can track specific query executions as long as their records are still among the 10000 kept by the view. As time passes the oldest records are phased out in favor of more recent ones.

This is an interesting look at some of the differences between Azure SQL Data Warehouse and a “normal” SQL Server installation.  Good reading.

Comments closed

Date Dimension With DAX

Meagan Longoria shows how to create a date dimension using only DAX:

The fiscal calendar assumes calendar months with the fiscal calendar shifted by some number of months. It also assumes that the fiscal year is ahead of the calendar year. That is, fiscal year 2017 starts in the first day of some month in 2016. In my example script, the fiscal year starts in October. If you have some 4-4-5, 4-5-4, or other calendar, the fiscal calendar calculations in this script won’t help you.

To add this date dimension to your SSAS Tabular project:

  1. Right click on the Model.bim file in the Solution Explorer and choose View Code.

  2. If you have at least one table in the model already, locate the end of the definition of the previous table, inside the table array. Add a comma after the end curly brace for the previous table.

  3. Paste the JSON/TMSL from my Gist.

  4. Save and close the file.

  5. Right click on Model.Bim and choose View Designer.

Click through for the script and additional explanation.

Comments closed

Azure SQL Data Warehouse Plans

Grant Fritchey shows how to build an execution plan for an Azure SQL Data Warehouse query:

So now we just save this as a .sqlplan file and open it in SSMS, right?

Nope!

See, that’s not a regular execution plan, at all. Instead, it’s a D-SQL plan. It’s not the same as our old execution plans. You can’t open it as a graphical plan (and no, not even in that very popular 3rd party tool, I tried). You will have to learn how to read these plans differently because, well, they are different.

That’s an unfortunate outcome.  Reading is hard…

Comments closed

Documenting A Data Warehouse

Jesse Seymour discusses a few forms of documentation for a data warehouse:

Extended properties are a great way to internally document the data warehouse.  The key advantage here is that the values of these extended properties can be retrieved with a T-SQL query.  This allows us to access this information with a view as needed.  My favorite method of using this is to create an SSRS report that end users can run to look up the attributes and comments I store in the extended property.  Data warehouse tools take some of the pain out of the process.  Unfortunately, not all tools support use of extended properties.  Make sure your tool does or consider changing tools.  Be sure to document the names and use cases for each property you create.  Consistency is the key to the value here.

I’ve never been a big fan of extended properties, mostly because I typically don’t work with tools which expose that information easily.  Regardless, there are other important forms of documentation, so read on.

Comments closed

Migrating To Azure SQL Data Warehouse

Rangarajan Srirangam has a detailed article on steps you should take when migrating a database to Azure SQL Data Warehouse:

This article focuses on migrating data to Azure SQL Data Warehouse with tips and techniques to help you achieve an efficient migration. Once you understand the steps involved in migration, you can practice them by following a running example of migrating a sample database to Azure SQL Data Warehouse.

Migrating your data to Azure SQL Data Warehouse involves a series of steps. These steps are executed in three logical stages: Preparation, Metadata migration and Data migration.

It’s a lengthy read, but well worth it.

Comments closed

Getting Started With Azure SQL Data Warehouse

Warner Chaves looks at Azure SQL Data Warehouse:

The first thing to keep in mind is that ASDW was designed to be a cloud based system. As such, it aims to be very flexible for resource allocation and very efficient to scale up or down. To meet those goals, the system allows you too:

  • Increase or decrease compute power represented by Data Warehousing Units.

  • The amount of storage can grow and is charged independently from the compute power.

  • The compute power can be completely paused and only storage is payed at that point.

Warner also has a brand new Pluralsight course on the topic.

Comments closed

BigQuery Versus Redshift

Kiyoto Tamura compares Google’s BigQuery versus Amazon’s Redshift for cloud-based warehousing:

Neither service is truly “set and forget” and requires a dedicated engineer to learn the service and maintain it. You can use various tools to automate many aspects of the operation, but someone will have to maintain automation scripts and workflows.

That said, here are things that I’ve heard first-hand from talking to users

The bottom line there is that Redshift is a bit more mature than BigQuery today, but keep an eye on both of them.

Comments closed

Warehouse History

Kennie Pontoppidan delves into various aspects of collecting and storing history in warehouses:

In T2 history we have the two attributes ValidFromDate and ValidToDate. We can choose two different strategies for updating the values of these: using system time (load time) or business time. If we use system time for the T2 splits, the data warehouse history is dependent on when we load data. This makes it impossible to reload data in the data warehouse without messing up the data history. If we allow our load ETL procedures to use timestamps for business time (when data was really valid) for T2 history, we get the opportunity to reload data. But the cost of this flexibility is a much more complicated design for T2 splits. We also need to keep track of this metadata on the source system attributes.

Part of a warehouse’s value is its ability to replay historical data, but you can only do that if you store the data correctly (and query it correctly!).

Comments closed

Warehouses Will Live On

Jesse Seymour argues that in-memory analysis solutions will not entirely replace data warehouses:

The big reason that dimensional modeling increases clarity is that the dimensional model seeks to flatten data as much as possible.  Let’s compare two examples.  Both of these examples are for a fictional health clinic.

The first example is that we want a report on how many male patients were  treated with electric shock therapy by provider, grouped monthly and spanning year to date range.

Those big Kimball-style warehouses do a great job of making it easier for people who are not database specialists to query data and get meaningful, consistent results to known business questions.  The trick to understanding data platforms is that they tend to be complements rather than substitutes:  introducing Spark-R in your environment does not replace your Kimball-style warehouse; it complements it by letting analysts find trends more easily.  Similarly, a Hadoop cluster potentially lets you complement an existing data warehouse in a few ways:  acting as a data aggregator (which allows you to push some ETL work off onto the cluster), a data collector (especially for information which is useful but doesn’t really fit in your conformed warehouse), and a data processor (particularly for those gigantic queries which are not time-sensitive).

Comments closed