Press "Enter" to skip to content

Category: Warehousing

OLAP On Hadoop

Tim Spann discusses OLAP options on the Hadoop stack:

Apache Kylin

For an introduction to this interesting Hadoop project, check out this article.   Apache Kylin originally from eBay, is a Distributed Analytics Engine that provides SQL and OLAP access to Hadoop datasets utilizing Hive and HBase.   It can use called through SparkSQL as well making for a very useful project.   This project let’s you work with PowerBI, Tableau and Excel with more tool support coming soon.    You can doMOLAP cubes and support many users with fast queries over billions of rows.   Apache Kylin provides JDBC and ODBC drivers.

There are a few interesting options here.

Comments closed

Comparing Impala To Redshift

Mostafa Mokhtar, et al, have a comparison of Apache Impala to Amazon Redshift:

For this analysis, we used TPC-DS on a 3TB dataset and selected 70 out of 99 the queries that run without any modifications or uses variants on both Redshift and Impala. We wanted to use a larger dataset (similar to what we’ve used in previous benchmarks), but due to Redshift’s data load times, we had to reduce the data size. (Note: This benchmark is derived from the TPC-DS benchmark and, as such, is not directly comparable to published TPC-DS results.)

This is coming from one of the two vendors, so take it with however many grains of salt you’d like.

Comments closed

Automating Data Warehouse Testing

Koos van Strien discusses warehouse testing:

Case: we’ve integrated two sources of customers. We want to add a third source.

Q: How do we at the same time know that our current integration and solutions will continue to work while at the same time integrating the new sources?

A: Test it.

Q: How do we get faster deployments and more stability?

A: Automate the tests, so they can run continuously.

This is an interesting concept; do read the whole thing.

Comments closed

Azure SQL Data Warehouse Setup

Arun Sirpal configures a new instance of Azure SQL Data Warehouse:

The information shown here is the DSQL (Distributed SQL) plan – When you send a SQL query to SQL Data Warehouse, the Control node processes a query and converts the code to DSQL then the Control node sends the command to run in each of the compute nodes.

The returned query plan depicts sequential SQL statements; when the query runs it may involve parallelized operations, so some of the sequential statements shown may run at the same time. More information can be found at the following URL https://msdn.microsoft.com/en-us/library/mt631615.aspx.

Arun also looks at running a simple Power BI report off of Azure SQL Data Warehouse; click through for that.

Comments closed

Azure SQL Data Warehouse Architecture

Warner Chaves looks at system views in Azure SQL Data Warehouse:

Unlike the sys.dm_exec_requests view in SQL Server, the sys.dm_pdw_exec_requests view actually keeps up to 10000 records with the information of a request even after it has executed. This capability is very useful as you can track specific query executions as long as their records are still among the 10000 kept by the view. As time passes the oldest records are phased out in favor of more recent ones.

This is an interesting look at some of the differences between Azure SQL Data Warehouse and a “normal” SQL Server installation.  Good reading.

Comments closed

Date Dimension With DAX

Meagan Longoria shows how to create a date dimension using only DAX:

The fiscal calendar assumes calendar months with the fiscal calendar shifted by some number of months. It also assumes that the fiscal year is ahead of the calendar year. That is, fiscal year 2017 starts in the first day of some month in 2016. In my example script, the fiscal year starts in October. If you have some 4-4-5, 4-5-4, or other calendar, the fiscal calendar calculations in this script won’t help you.

To add this date dimension to your SSAS Tabular project:

  1. Right click on the Model.bim file in the Solution Explorer and choose View Code.

  2. If you have at least one table in the model already, locate the end of the definition of the previous table, inside the table array. Add a comma after the end curly brace for the previous table.

  3. Paste the JSON/TMSL from my Gist.

  4. Save and close the file.

  5. Right click on Model.Bim and choose View Designer.

Click through for the script and additional explanation.

Comments closed

Azure SQL Data Warehouse Plans

Grant Fritchey shows how to build an execution plan for an Azure SQL Data Warehouse query:

So now we just save this as a .sqlplan file and open it in SSMS, right?

Nope!

See, that’s not a regular execution plan, at all. Instead, it’s a D-SQL plan. It’s not the same as our old execution plans. You can’t open it as a graphical plan (and no, not even in that very popular 3rd party tool, I tried). You will have to learn how to read these plans differently because, well, they are different.

That’s an unfortunate outcome.  Reading is hard…

Comments closed

Documenting A Data Warehouse

Jesse Seymour discusses a few forms of documentation for a data warehouse:

Extended properties are a great way to internally document the data warehouse.  The key advantage here is that the values of these extended properties can be retrieved with a T-SQL query.  This allows us to access this information with a view as needed.  My favorite method of using this is to create an SSRS report that end users can run to look up the attributes and comments I store in the extended property.  Data warehouse tools take some of the pain out of the process.  Unfortunately, not all tools support use of extended properties.  Make sure your tool does or consider changing tools.  Be sure to document the names and use cases for each property you create.  Consistency is the key to the value here.

I’ve never been a big fan of extended properties, mostly because I typically don’t work with tools which expose that information easily.  Regardless, there are other important forms of documentation, so read on.

Comments closed

Migrating To Azure SQL Data Warehouse

Rangarajan Srirangam has a detailed article on steps you should take when migrating a database to Azure SQL Data Warehouse:

This article focuses on migrating data to Azure SQL Data Warehouse with tips and techniques to help you achieve an efficient migration. Once you understand the steps involved in migration, you can practice them by following a running example of migrating a sample database to Azure SQL Data Warehouse.

Migrating your data to Azure SQL Data Warehouse involves a series of steps. These steps are executed in three logical stages: Preparation, Metadata migration and Data migration.

It’s a lengthy read, but well worth it.

Comments closed

Getting Started With Azure SQL Data Warehouse

Warner Chaves looks at Azure SQL Data Warehouse:

The first thing to keep in mind is that ASDW was designed to be a cloud based system. As such, it aims to be very flexible for resource allocation and very efficient to scale up or down. To meet those goals, the system allows you too:

  • Increase or decrease compute power represented by Data Warehousing Units.

  • The amount of storage can grow and is charged independently from the compute power.

  • The compute power can be completely paused and only storage is payed at that point.

Warner also has a brand new Pluralsight course on the topic.

Comments closed