Press "Enter" to skip to content

Category: Warehousing

Building a Type-2 Slowly Changing Dimension

Kristyna Ferris builds a dimension:

This is a blog that I am writing for future me and hopefully it’ll help a few of you save some time too! It’s not often that I get to build out a data warehouse from scratch, but when I do, I want to make sure I do it well with best practices in place. Because this is not something I do a lot of, I frequently forget lessons I’ve learned and have to go back and drop tables to recreate them in the best way before it’s too late. One table type that is vital to do right the first time is a Slowly Changing Dimension Type 2 (SCD2 for short).

Click through for an explanation, as well as example scripts for both SQL Server-adjacent products and the Microsoft Fabric warehouse.

Leave a Comment

Building a Type-6 Slowly Changing Dimension

Dinesh Asanka creates a dimenson:

In a data warehouse, one important concept is to retain historical data. This data is typically not available in operational systems. One approach in data warehouses is the use of Slowly Changing Dimensions (SCDs). What are the SCD options and are there any new approaches?

Click through for a quick depiction of Types 0 through 3, and then where 6 fits into the mix. I’m not 100% sure I’ve ever actually used a Type-6 slowly changing dimension in a production environment, though there are specific circumstances in which one could be quite useful.

Leave a Comment

CLUSTER BY in Microsoft Fabric Data Warehouse

Nikola Ilic shows off a relatively new feature:

The first thing every Fabric architect reaches for in this situation is the usual playlist: check the query plan, look at the joins, validate the statistics, maybe scale up the capacity. All worth doing, but none of those things addressed what was actually happening: the warehouse was scanning the entire table for every filtered query, because there was no way to tell it which Parquet files actually contained the rows we cared about.

However, Microsoft shipped data clustering in preview at the end of November 2025, and the entire conversation changed.

In this article, I want to walk you through what data clustering is, how it works under the hood, and most importantly, I’ll show you a real demo on a 100-million-row clickstream table that you can run in your own warehouse. No abstractions, no marketing numbers, but actual T-SQL you can paste.

Some of the notes Nikola mentions remind me of some of the rules around making columnstore indexes work and for much of the same reason. But as Nikola’s demo shows, this is definitely a “You must be this tall to ride the ride” feature, and unless you’re talking about quite large fact tables with (at a minimum) billions of rows of data, the benefit mostly comes from reducing CUs rather than wall clock time improvements.

Comments closed

Clustering in Fabric Warehouse

Koen Verbeeck speeds things up:

We are building a large warehouse in Microsoft Fabric using the warehouse. Our biggest fact tables have some performance issues when we are running our analytical queries, and it seems we cannot use indexes in the Fabric Warehouse. Is there some way to improve performance?

Click through to see how you can use clustering to improve the performance of warehousing queries, as well as some of the pre-requisites to make it work.

Comments closed

ANY_VALUE() in Fabric Data Warehouse

Jovan Popovic notes a feature going GA:

Fabric Data Warehouse now supports the ANY_VALUE() aggregate, making it easier to write readable, efficient T-SQL when you want to group by a key but still return descriptive columns that are functionally the same for every row in the group.

Right now, this is only available in the Fabric Data Warehouse, so no Azure SQL DB, Managed Instance, or box product support at this time.

Comments closed

Common Star Schema Mistakes

Ben Richardson gets back to basics:

Sometimes the culprit isn’t actually your DAX, it’s your data model.

Star schema mistakes are incredibly common in Power BI, and really hard to track down.

When your data model isn’t a clean star schema, you end up with broken filters, confusing relationships and slow visuals.

It’s important to get it right from the start! So we broke down the top 10 most common mistakes people make, how to identify them and how to fix them!

This is where reviewing (or reading) Ralph Kimball’s Data Warehouse Toolkit can save you a lot of time and stress. The Microsoft data analytics world is predicated so heavily on Kimball-style dimensional modeling that the choices tend to be building a proper star schema up-front or spend processing and developer time trying to fix it in post-production using DAX or trickery.

Comments closed

Tips for Building a Data Warehouse

James Serra gets back to foundations:

I had a great question asked of me the other day and thought I would turn the answer into a blog post. The question is “I’m an experienced DBA in SQL Server/SQL DB, and my company is looking to build their first data warehouse using Microsoft Fabric. What are the best resources to learn how to do your first data warehouse project?”. So, below are my favorite books, videos, blogs, and learning modules to help answer that question:

Click through for James’s recommendations. I strongly agree with his advice to start with Ralph Kimball’s The Data Warehouse Toolkit, and frankly, I think a lot of James’s advice here is sound. The person asking focuses on Fabric, and there are plenty of Fabric-specific things to learn, but at the end of the day, modern data warehouses are still data warehouses.

Comments closed

View Creation via Visual Queries in Microsoft Fabric

Jon Vöge creates a view:

As companies adopt Microsoft Fabric, the distance between backend artifact and Semantic Model is smaller than ever, and it feels more obvious than ever to push some of those local transformations to your Fabric Storage item of choice.

The question is. How do you do that? There are many options:

Read on for those options. Jon focuses on one for people with less database experience.

Comments closed

Microsoft Fabric Warehouse Snapshots now GA

Twinkle Cyril makes an announcement:

Managing data consistency during ETL has always been a challenge for our customers. Dashboards break, KPIs fluctuate, and compliance audits become painful when reporting hits ‘half-loaded’ data. With Warehouse Snapshots, Microsoft Fabric solves this by giving you a stable, read-only view of your warehouse at a specific point in time and now, this capability is Generally Available! Think of this as a true time travel database, an industry-first capability that sets us apart.

I wonder how much they differ from the database snapshots available in SQL Server.

Comments closed

Star Schemas and Keys

Chris Barber provides a primer on the types of keys that are critical for a star schema:

Keys are a core component of star schema modelling; relationships between tables are built using the keys. This article covers:

  1. The main key types
  2. Star Schema diagrams
  3. Best practices when using Keys

An understanding of keys become increasingly important with more complex solutions. Not only do you need to understand them from a modelling perspective, but a common vernacular is required to communicate with team members.

It’s easier to think of the keys Chris describes in two separate classes rather than four unique items. Surrogate and natural keys are descriptors of a primary key (or any other unique/alternate key), after all.

Comments closed