Warehousing – Page 26

Modern Data Warehouse Dictionary

Published 2017-01-19 by Kevin Feasel

Melissa Coates has put together a glossary of terms for modern data warehousing:

Logical Data Warehouse

A logical data warehouse (LDW) builds upon the traditional DW by providing unified data access to multiple platforms. Conceptually, the logical data warehouse is a view layer that abstractly accesses distributed systems such as relational DBs, NoSQL DBs, data lakes, in-memory data structures, and so forth, consolidating and relating the data in a virtual layer. This availability of data on various platforms adds flexibility to a traditional DW, and speeds up data availability. The tradeoff for this flexibility can be slower performance for user queries, though the full-fledged LDW vendors employ an array of optimization techniques to mitigate performance issues. A logical data warehouse is broader than just data virtualization and distributed processing which can be thought of as enabling technologies. According to Gartner a full-fledged LDW system also involves metadata management, repository management, taxonomy/ontology resolution, auditing & performance services, as well as service level agreement management.

If you’re just getting started with the topic, check this out, as it will probably clear up several concepts.

Comments closed

Free Trial Of Azure SQL Data Warehouse

Published 2016-11-09 by Kevin Feasel

James Serra notes that there is a free one-month trial of Azure SQL Data Warehouse:

You can use this one month free trial to do POCs and try out SQL DW up to 200 DWU and 2TB of data. You must sign up by December 31^st 2016. Please note that once the one month free trial is over, you will start getting billed at general availability pricing rates. For more information on the free trial, and to sign up, go here.

This is great because you can quickly run out of credits otherwise.

Comments closed

OLAP On Hadoop

Published 2016-11-03 by Kevin Feasel

Tim Spann discusses OLAP options on the Hadoop stack:

Apache Kylin

For an introduction to this interesting Hadoop project, check out this article. Apache Kylin originally from eBay, is a Distributed Analytics Engine that provides SQL and OLAP access to Hadoop datasets utilizing Hive and HBase. It can use called through SparkSQL as well making for a very useful project. This project let’s you work with PowerBI, Tableau and Excel with more tool support coming soon. You can doMOLAP cubes and support many users with fast queries over billions of rows. Apache Kylin provides JDBC and ODBC drivers.

There are a few interesting options here.

Comments closed

Comparing Impala To Redshift

Published 2016-09-26 by Kevin Feasel

Mostafa Mokhtar, et al, have a comparison of Apache Impala to Amazon Redshift:

For this analysis, we used TPC-DS on a 3TB dataset and selected 70 out of 99 the queries that run without any modifications or uses variants on both Redshift and Impala. We wanted to use a larger dataset (similar to what we’ve used in previous benchmarks), but due to Redshift’s data load times, we had to reduce the data size. (Note: This benchmark is derived from the TPC-DS benchmark and, as such, is not directly comparable to published TPC-DS results.)

This is coming from one of the two vendors, so take it with however many grains of salt you’d like.

Comments closed

Automating Data Warehouse Testing

Published 2016-09-26 by Kevin Feasel

Koos van Strien discusses warehouse testing:

Case: we’ve integrated two sources of customers. We want to add a third source.

Q: How do we at the same time know that our current integration and solutions will continue to work while at the same time integrating the new sources?

A: Test it.

Q: How do we get faster deployments and more stability?

A: Automate the tests, so they can run continuously.

This is an interesting concept; do read the whole thing.

Comments closed

Azure SQL Data Warehouse Setup

Published 2016-09-22 by Kevin Feasel

Arun Sirpal configures a new instance of Azure SQL Data Warehouse:

The information shown here is the DSQL (Distributed SQL) plan – When you send a SQL query to SQL Data Warehouse, the Control node processes a query and converts the code to DSQL then the Control node sends the command to run in each of the compute nodes.

The returned query plan depicts sequential SQL statements; when the query runs it may involve parallelized operations, so some of the sequential statements shown may run at the same time. More information can be found at the following URL https://msdn.microsoft.com/en-us/library/mt631615.aspx.

Arun also looks at running a simple Power BI report off of Azure SQL Data Warehouse; click through for that.

Comments closed

Azure SQL Data Warehouse Architecture

Published 2016-09-20 by Kevin Feasel

Warner Chaves looks at system views in Azure SQL Data Warehouse:

Unlike the sys.dm_exec_requests view in SQL Server, the sys.dm_pdw_exec_requests view actually keeps up to 10000 records with the information of a request even after it has executed. This capability is very useful as you can track specific query executions as long as their records are still among the 10000 kept by the view. As time passes the oldest records are phased out in favor of more recent ones.

This is an interesting look at some of the differences between Azure SQL Data Warehouse and a “normal” SQL Server installation. Good reading.

Comments closed

Date Dimension With DAX

Published 2016-09-07 by Kevin Feasel

Meagan Longoria shows how to create a date dimension using only DAX:

The fiscal calendar assumes calendar months with the fiscal calendar shifted by some number of months. It also assumes that the fiscal year is ahead of the calendar year. That is, fiscal year 2017 starts in the first day of some month in 2016. In my example script, the fiscal year starts in October. If you have some 4-4-5, 4-5-4, or other calendar, the fiscal calendar calculations in this script won’t help you.

To add this date dimension to your SSAS Tabular project:

Right click on the Model.bim file in the Solution Explorer and choose View Code.
If you have at least one table in the model already, locate the end of the definition of the previous table, inside the table array. Add a comma after the end curly brace for the previous table.
Paste the JSON/TMSL from my Gist.
Save and close the file.
Right click on Model.Bim and choose View Designer.

Click through for the script and additional explanation.

Comments closed

Azure SQL Data Warehouse Plans

Published 2016-08-30 by Kevin Feasel

Grant Fritchey shows how to build an execution plan for an Azure SQL Data Warehouse query:

So now we just save this as a .sqlplan file and open it in SSMS, right?

Nope!

See, that’s not a regular execution plan, at all. Instead, it’s a D-SQL plan. It’s not the same as our old execution plans. You can’t open it as a graphical plan (and no, not even in that very popular 3rd party tool, I tried). You will have to learn how to read these plans differently because, well, they are different.

That’s an unfortunate outcome. Reading is hard…

Comments closed

Documenting A Data Warehouse

Published 2016-08-26 by Kevin Feasel

Jesse Seymour discusses a few forms of documentation for a data warehouse:

Extended properties are a great way to internally document the data warehouse. The key advantage here is that the values of these extended properties can be retrieved with a T-SQL query. This allows us to access this information with a view as needed. My favorite method of using this is to create an SSRS report that end users can run to look up the attributes and comments I store in the extended property. Data warehouse tools take some of the pain out of the process. Unfortunately, not all tools support use of extended properties. Make sure your tool does or consider changing tools. Be sure to document the names and use cases for each property you create. Consistency is the key to the value here.

I’ve never been a big fan of extended properties, mostly because I typically don’t work with tools which expose that information easily. Regardless, there are other important forms of documentation, so read on.

Comments closed

Category: Warehousing

Logical Data Warehouse

Apache Kylin