Press "Enter" to skip to content

Category: Warehousing

ML Models and Data Warehouses in Microsoft Fabric

Tomaz Kastrun continues a series on Microsoft Fabric. First up is creating ML models:

Protip: Both experiments and the ML model version look similar, and you can intuitively switch between both of them. But do not get confused, as the ML Model version applies the best-selected model from the experiment and can be used for inference.

Then we switch context to data warehousing:

Today we will start exploring the Fabric Data Warehouse.

With the data lake-centric logic, the data warehouse in Fabric is built on a distributed processing engine, that enables automated scaling. The SaaS experience creates a segway to easier analysis and reporting, and at the same time gives the ability to run heavy workloads against open data format, simply by using transact SQL (T-SQL). Microsoft OneLake gives all the services to hold a single copy of data and can be consumed in a data warehouse, datalake or SQL Analytics.

Comments closed

Constraints in Microsoft Fabric Data Warehouses

Brian Bønk slips out of the constraints:

When working with data and building data models, I personally seldom use the constraints feature on a database. Call me lazy – but I think constraints are adding unnessesary complexity when building data models for reporting. Especially if you are working with the some of new platforms – like Microsoft Fabric, where you are using staleless compute, aka. data storage is seperated from the compute layer.

I understand the need for contraints on other database systems like OLTP systems.

In reporting models it can be somewhat usefull to have constraints between tables, as they help/force you to some level of governance in your datamodel.

But how can we use this in Microsoft Fabric and are they easy to work with?

Read on for those answers. I will note that I’m a stickler about constraints in transactional systems, though I agree that constraints in warehouses are not critical—assuming, at least, that you’re following the Kimball approach and have one and only one mechanism to write data, and that you have other mechanisms for vetting data quality.

Comments closed

Data Warehouse ETL Patterns

Ben Johnston starts a new series:

No matter the ETL tool used, there are some basic patterns to follow when transferring data between systems. There are many data tools and platforms, but the basic patterns remain the same. This focuses on SQL Server, but most of these methods work in any data platform. Even if you are using a virtualization layer, you likely need to prepare the data before exposing it to that engine, which means ETL and data transfers.

Warehouse is very loosely a data warehouse, but the same process applies to other systems. This includes virtualization layers, and to a smaller degree, bulk transfers between transactional systems.

Read on for a few things Ben recommends you have in place before beginning the project, as well as several warehouse loading patterns.

Comments closed

Choosing the Right Technology in the Modern Azure Data Warehouse

Josephine Bush has some advice:

Here’s a quick description of the options we explored:

  • Azure Data Factory – Orchestrates and automates data movement and transformation. You can create workflows, pipelines, and ETL (Extract, Transform, Load) processes using it.
  • Databricks – A unified data science, engineering, and analytics platform based on Apache Spark. It simplifies data exploration, preparation, and machine learning workflows, allowing teams to collaborate efficiently. Interactive notebooks make Databricks a versatile tool for scalable data analysis and processing.
  • Synapse – Integration of big data and data warehousing in the cloud. It facilitates collaborative analytics and AI-driven insights using serverless and provisioned resources across various data sources. Integrated analytics, warehousing, and data integration are part of Synapse’s unified experience.
  • Fabric – An all-in-one analytics solution for enterprises that offers data movement, data lakes, data engineering, data integration, data science, and real-time analytics.

Read on for pros and cons of different options Josephine & crew reviewed, as well as the option they landed on and why.

Comments closed

DAX Time Intelligence with a Fiscal Year Differing from Calendar Year

Olivier Van Steenlandt covers a common case:

Many companies don’t follow the regular Calendar as we know (January 1st – December 31st). They follow their own Financial Calendar (often called Fiscal Calendar) which can start at any time of the year.

Because of this, writing Year-To-Date calculations in DAX for your Tabular Model might seem challenging.

In the step-by-step example, we are working for a company that starts its Financial Year on July 1st.

Read on to see one way to do it. It doesn’t quite solve the problem Olivier brought up, but I’d also make note that having a calendar table with fiscal + calendar year information in it helps remarkably well. It can even handle multiple fiscal year concepts; as an example, a state agency I worked for had a fiscal year on July 1 but the US federal government’s fiscal year begins October 1, so it was just a matter of having StateFiscalYear and FederalFiscalYear columns.

Also, check out Olivier’s new theming, under the Data Cuisine motif.

1 Comment

Data Warehouse Updates for Microsoft Fabric

Dennes Torres brings the news:

We have a specific statement to clone a table. But what exactly does it mean?

The Clone Table feature promises to create an image of the table on a specific point in time or with the current information. The documentation is not precise, because at some points it says it’s only a clone of the structure, but we can see the data on the table.

What’s the advantage of this over a simple SELECT INTO statement?

Read on for that comparison, as well as several other things recently added to Microsoft Fabric data warehouses.

Comments closed

Handling Source System Deletions in a Warehouse

Rayis Imayev deletes some rows:

When something important disappears, it’s natural to start asking questions and looking for answers, especially when that missing piece has had a significant impact on your life.

Similarly, when data that used to exist in your sourcing system suddenly vanishes without any trace, you’re likely to react in a similar way. You might find yourself reaching out to higher authorities to understand why the existing data management system design allowed this to happen. Your colleagues might wonder if better ways to handle such data-related issues exist. Ultimately, you’ll embark on a quest to question yourself about what could have been done differently to avoid the complete loss of that crucial data.

Kimball-style data warehousing already has the idea of type-2 slowly changing dimensions, which allow you to track the deletion of dimensional data by assigning an end date to the row and not inserting a new record with the next start date. It’s a little harder to deal with fact data deletions in that way, though, as there historically is no concept of slowly changing facts.

Read on for some thoughts on the topic from Rayis.

Comments closed

Workaround for Primary Keys in Fabric Data Warehouses

Gilbert Quevauvilliers needs a key:

When I started looking into using the data warehouses feature in Fabric, I did see that there were limitations on Primary Key columns.

Below is my blog post on how I still use keys in my data warehouse, instead of using GUID’s which to me are long and hard to use.

In my example I am going to create a simple data warehouse which is going to consist of two-dimension tables (Date and Country) and a fact table with the Sales amounts.

This seems sub-optimal, though at least Gilbert shows us a workaround.

Comments closed

Microsoft Fabric Data Warehouse in a Database Project

Kevin Chant creates a database project:

In this post I want to cover how you can share a Microsoft Fabric Data Warehouse Database Project with the new target platform.

Which is now possible thanks to the latest Azure Data Studio Insiders update. You can view the ‘Add projects support for Fabric DW‘ pull request in the public azuredatastudio GitHub repository.

Kevin takes us through creating the database project in Azure Data Studio and then using Azure DevOps or Azure Data Studio to deploy it back out.

Comments closed