Press "Enter" to skip to content

Category: Warehousing

Data Vault 2.0 Models in Microsoft Fabric

Michael Olschimke and Dmytro Polishchuk continue a series:

The last article in this blog series discussed the basic entity types in Data Vault 2.0: hubs, links and satellites. While it would be theoretically possible to limit a model to just these three basic entity types, the resulting Data Vault model would be inefficient: it would most likely consume too much storage, be less efficient due to the many joins, and require a number of grain shifts during information delivery. This is due to certain characteristics in the data that require special treatment.

For these characteristics, Data Vault 2.0 provides special entity types that deal with the specialities. This article focuses on two of them: the non-historized link, which is used to capture transactions and events, and the multi-active satellite, which is used to model multiple active descriptions for the same parent hub or link in the same load.

Read on for an example of how to implement this in a Microsoft Fabric warehouse.

Comments closed

Create and Connect to a Fabric Data Warehouse

Olivier Van Steenlandt builds a warehouse:

In this data recipe series, Microsoft Fabric – Data Warehouse will be explored. As a starting point, a blank Fabric workspace is used. You can sign up for a free Fabric trial by using the following URL: Data Analytics | Microsoft Fabric

In this data recipe, we will create a brand-new Data Warehouse in Fabric. Once created, we will connect to our Data Warehouse using Azure Data Studio.

Click through for the step-by-step process.

Comments closed

Common Warehouse Load Patterns

Ben Johnston continues a series on warehouse load patterns:

This continues and finishes my two-part series on warehouse load patterns. There are many methods to transfer rows between systems from a basic design perspective. This isn’t specific to any ETL tool but rather the basic patterns for moving data. The most difficult part in designing a pattern is efficiency. It has to be accurate and not adversely impact the source system, but this is all intertwined and dependent on efficiency. You only want to move the rows that have changed or been added since the previous ETL execution, deltas. This reduces the network load, the source system load (I/O, CPU, locking, etc.), the destination system load. Being efficient also improves the speed and as a direct result it increases the potential frequency for each ETL run, which has a direct impact on business value.

The pattern you select depends on many things. The previous part of the series covers generic design patterns and considerations for warehouse loads that can be applied to most of the ETL designs presented below. This section covers patterns I have used in various projects. I’m sure there are some patterns I have missed, but these cover the most used types that I have seen. These are not specific to any data engine or ETL tool, but the examples use SQL Server as a base for functionality considerations. Design considerations, columns available, administrative support, DevOps practices, reliability of systems, and cleanliness of data all come into consideration when choosing your actual ETL pattern.

Click through for a compendium of common patterns you can use to indicate that a row should go into a warehouse.

Comments closed

Warehousing and Power BI in Microsoft Fabric

Tomaz Kastrun continues a series on Microsoft Fabric. Day 15 covers building a warehouse:

I have named my as “Advent2023_DWH”.

You can create a warehouse using T-SQL scripts, from data flow gen2, from data pipelines and from the sample data. Let’s select the sample data and grab a coffee.

Day 16 looks at data pipelines:

With the Fabric warehouse created and explored, let’s see, how we can use pipelines to get the data into Fabric warehouse.

In the existing data warehouse, we will introduce new data. By clicking “new data”, two options will be available; pipelines and dataflows. Select the pipelines and give it a name.

And Day 17 provides a primer on how Power BI can read Fabric assets:

Within the Power BI in Fabric, you will find many of the components, that can be used to create a final report. And here are the components:

Comments closed

ML Models and Data Warehouses in Microsoft Fabric

Tomaz Kastrun continues a series on Microsoft Fabric. First up is creating ML models:

Protip: Both experiments and the ML model version look similar, and you can intuitively switch between both of them. But do not get confused, as the ML Model version applies the best-selected model from the experiment and can be used for inference.

Then we switch context to data warehousing:

Today we will start exploring the Fabric Data Warehouse.

With the data lake-centric logic, the data warehouse in Fabric is built on a distributed processing engine, that enables automated scaling. The SaaS experience creates a segway to easier analysis and reporting, and at the same time gives the ability to run heavy workloads against open data format, simply by using transact SQL (T-SQL). Microsoft OneLake gives all the services to hold a single copy of data and can be consumed in a data warehouse, datalake or SQL Analytics.

Comments closed

Constraints in Microsoft Fabric Data Warehouses

Brian Bønk slips out of the constraints:

When working with data and building data models, I personally seldom use the constraints feature on a database. Call me lazy – but I think constraints are adding unnessesary complexity when building data models for reporting. Especially if you are working with the some of new platforms – like Microsoft Fabric, where you are using staleless compute, aka. data storage is seperated from the compute layer.

I understand the need for contraints on other database systems like OLTP systems.

In reporting models it can be somewhat usefull to have constraints between tables, as they help/force you to some level of governance in your datamodel.

But how can we use this in Microsoft Fabric and are they easy to work with?

Read on for those answers. I will note that I’m a stickler about constraints in transactional systems, though I agree that constraints in warehouses are not critical—assuming, at least, that you’re following the Kimball approach and have one and only one mechanism to write data, and that you have other mechanisms for vetting data quality.

Comments closed

Data Warehouse ETL Patterns

Ben Johnston starts a new series:

No matter the ETL tool used, there are some basic patterns to follow when transferring data between systems. There are many data tools and platforms, but the basic patterns remain the same. This focuses on SQL Server, but most of these methods work in any data platform. Even if you are using a virtualization layer, you likely need to prepare the data before exposing it to that engine, which means ETL and data transfers.

Warehouse is very loosely a data warehouse, but the same process applies to other systems. This includes virtualization layers, and to a smaller degree, bulk transfers between transactional systems.

Read on for a few things Ben recommends you have in place before beginning the project, as well as several warehouse loading patterns.

Comments closed

Choosing the Right Technology in the Modern Azure Data Warehouse

Josephine Bush has some advice:

Here’s a quick description of the options we explored:

  • Azure Data Factory – Orchestrates and automates data movement and transformation. You can create workflows, pipelines, and ETL (Extract, Transform, Load) processes using it.
  • Databricks – A unified data science, engineering, and analytics platform based on Apache Spark. It simplifies data exploration, preparation, and machine learning workflows, allowing teams to collaborate efficiently. Interactive notebooks make Databricks a versatile tool for scalable data analysis and processing.
  • Synapse – Integration of big data and data warehousing in the cloud. It facilitates collaborative analytics and AI-driven insights using serverless and provisioned resources across various data sources. Integrated analytics, warehousing, and data integration are part of Synapse’s unified experience.
  • Fabric – An all-in-one analytics solution for enterprises that offers data movement, data lakes, data engineering, data integration, data science, and real-time analytics.

Read on for pros and cons of different options Josephine & crew reviewed, as well as the option they landed on and why.

Comments closed

DAX Time Intelligence with a Fiscal Year Differing from Calendar Year

Olivier Van Steenlandt covers a common case:

Many companies don’t follow the regular Calendar as we know (January 1st – December 31st). They follow their own Financial Calendar (often called Fiscal Calendar) which can start at any time of the year.

Because of this, writing Year-To-Date calculations in DAX for your Tabular Model might seem challenging.

In the step-by-step example, we are working for a company that starts its Financial Year on July 1st.

Read on to see one way to do it. It doesn’t quite solve the problem Olivier brought up, but I’d also make note that having a calendar table with fiscal + calendar year information in it helps remarkably well. It can even handle multiple fiscal year concepts; as an example, a state agency I worked for had a fiscal year on July 1 but the US federal government’s fiscal year begins October 1, so it was just a matter of having StateFiscalYear and FederalFiscalYear columns.

Also, check out Olivier’s new theming, under the Data Cuisine motif.

1 Comment

Data Warehouse Updates for Microsoft Fabric

Dennes Torres brings the news:

We have a specific statement to clone a table. But what exactly does it mean?

The Clone Table feature promises to create an image of the table on a specific point in time or with the current information. The documentation is not precise, because at some points it says it’s only a clone of the structure, but we can see the data on the table.

What’s the advantage of this over a simple SELECT INTO statement?

Read on for that comparison, as well as several other things recently added to Microsoft Fabric data warehouses.

Comments closed