Press "Enter" to skip to content

Category: Warehousing

SCD Types in Microsoft Fabric

Kenneth Omorodion reminds us that the Kimball model is still quite valuable:

In modern data warehousing, how we handle updates to dimension tables is crucial. There are several approaches; but the decision often comes down to two primary strategies: Slowly Changing Dimensions (SCD) Type 2 and overwriting tables. Each has its own benefits, use cases, and trade-offs. This tip will explore the two methods and why SCD Type 2 is often a better option in many data warehouse scenarios.

Read on for this overview of the benefits of type-2 slowly changing dimensions, as well as a little bit of coverage of several other types of slowly changing dimensions.

Comments closed

Create a Case Insensitive Warehouse in Microsoft Fabric

Gilbert Quevauvilliers is speaking my language:

This is a quick blog post to show you how to use a Microsoft Fabric Notebook to quickly and easily create a Case Insensitive Warehouse.

Just a quick note when I talk about a Case Insensitive Warehouse, what that means is that the upper casing and lower casing of column names and text are ignored. By default, Warehouses and Lakehouse’s are case sensitive in Microsoft Fabric.

Case sensitivity is a trap, so I applaud Gilbert’s commitment to excellence here.

Comments closed

Sending E-Mail from SQL Server and Snowflake

Kevin Wilkie says you’ve got mail:

For the many other parameters you can use with sp_sendmail, please refer to the Microsoft Documentation found here.

After running this stored procedure, SQL Server queues the email to be sent. From there, you should check often to see if it was actually sent. (And yes, it can get stuck there forever.)

With, Snowflake, it’s actually fairly close to the same basic principle as SQL Server.

Click through for the syntax for both systems.

Comments closed

Comparing Snowflake vs SQL Server E-Mail Configuration

Kevin Wilkie sends two e-mails:

Today, I want to talk about all the effort that goes into setting up the ability to email in SQL Server and Snowflake.

First is our old friend – SQL Server. I’ll leave this one to the experts at Microsoft. As has been the case over the last few years, they have some great documentation at Learn.Microsoft.com – especially when it comes to SQL Server.

I don’t know anything about sending e-mails via Snowflake (other than what Kevin mentions here), though I imagine a lot of the difference in complexity is that SQL Server allows arbitrary SMTP selection and requires an existing SMTP server.

Comments closed

A Primer on Medallion Architecture in Microsoft Fabric

Kenneth Omorodion builds a warehouse:

Data warehouses are essential components of modern analytics systems, offering optimized storage and processing capabilities for large volumes of data. When integrated with a Lakehouse architecture, you can combine the best of both worlds—structured, schema-enforced data storage with the flexibility and scalability of data lakes. Microsoft Fabric provides an excellent environment for implementing the Medallion Architecture, a design pattern for building efficient data processing pipelines by layering data into bronze, silver, and gold zones.

Click through for the process.

Comments closed

Data Masking in Snowflake: Tagging Policies

Kevin Wilkie creates some tags:

In our last post, we discussed data masking and setting it up for specific columns. Now, I don’t know about y’all, but I deal with hundreds of tables with several columns in each on a daily basis that truly need data masking. If I have to go through and set up a masking policy one by one, I might go crazier than I normally am.

Let’s say that I have an existing table, Employee, in my database that I want to apply a policy. As I’m sure many of you know, any table with Employee data should have at least some of its data masked. The first thing that we need to do is to set up the tag that we’ll be using for, in our case, a column named Salary.

Read on to learn more about how to create tags and why you might want to.

Comments closed

Implementing Role-Playing Dimensions in Power BI

Teo Lachev puts on a mask:

Role-playing dimensions are a popular business requirement but yet challenging to implement in Power BI (and Tabular) due to a long-standing limitation that two tables can’t be joined multiple times with active relationships. Declarative relationships are both a blessing and a curse and, in this case, we are confronted with their limitations. Had Power BI allowed multiple relationships, the user must be prompted which path to take. Interestingly, a long time ago Microsoft considered a user interface for the prompting but dropped the idea for unknown reasons.

Given the existing technology limitations, you have two implementation choices for implementing subsequent role-playing dimensions: duplicating the dimension table (either in DW or semantic model) or denormalizing the dimension fields into the fact table. The following table presents pros and cons of each option:

Click through for that table, as well as some thoughts on viable approaches, including an edge case.

Comments closed

Data Masking in Snowflake

Kevin Wilkie puts on the mask:

The premise behind data masking is fairly simple – make it so that only certain people can see the data in a certain column. For example, social security numbers or credit card numbers probably shouldn’t be seen by everyone who has access to your database.

I like to set it up as follows: (Obviously, this can change depending on your business’ policies.)

  1. Admins have full access.
  2. Call Center Representatives have access to the last 4 digits of the social.
  3. All others see nothing useful.

Read on to see how. I’ll even save my normal “data masking isn’t really a security feature” rant. This time.

Comments closed

Reading Data from Azure Blob Storage in Snowflake

Arun Sirpal explains a common architectural pattern:

Let’s go back to data platforms today and I want to talk about a very common integration I see nowadays, Azure Blob Storage linked to Snowflake via a storage integration which then we can access semi structured files via external tables, it is a good combination of technology I have to say.

Click through for an architecture diagram and example of the code you’d need.

Comments closed

Memoizing Functions with Snowflake

Kevin Wilkie is speaking my language (that is, the language of functional programming):

If you’ve been working with data for several years like I have – mostly using the SQL language – then I have a term for you that other languages, like JavaScript or Python, have had for a few years. The term is “memoizable” and it means, in a nutshell, to remember. A memoizable function caches the results so that it can return the resultset in record time, given the same parameters.

Yeah, it’s a fancy term that basically states, “Instead of calculating the result each time, I’ll just create a lookup table of all possible inputs and what the output is.” It’s really helpful when you have a small number of possible inputs and generating a result takes a while.

Read on to learn more about how this works in Snowflake, including several limitations.

Comments closed