Press "Enter" to skip to content

Category: Warehousing

The Basics of Fact-Dimensional Modeling

Nikola Ilic gives us a primer on Kimball-style fact and dimensional modeling:

Before we come up to explain why dimensional modelling is named like that – dimensional, let’s first take a brief tour through some history lessons. In 1996, a man called Ralph Kimball published a book “The Data Warehouse Toolkit”, which is still considered a dimensional modelling “Bible”. In his book, Kimball introduced a completely new approach to modelling data for analytical workloads, the so-called “bottom-up” approach. The focus is on identifying key business processes within the organization and modelling these first, before introducing additional business processes.

This is a really good overview of the topic, though I’m saddened that “dimensional bus matrix” didn’t make the cut of things to discuss. Mostly because I like the name “dimensional bus matrix.”

Comments closed

IS DISTINCT FROM in Snowflake

Kevin Wilkie remains distinct:

Now, the more fun – “new-ish” – version of the DISTINCT keyword.

Let’s take two values – A and B. Let’s define A = 7 and B = 2.

Snowflake will allow you to ask if A IS DISTINCT FROM B. Thankfully, in this case, it is.

Click through to see how this works. Also note that this syntax is available in SQL Server 2022.

Comments closed

String Casing in Snowflake

Kevin Wilkie is on the case:

When you’re working with a database, it’s very hard to not deal with strings at some point in your journey. There are lots of different functions that you will be working with when you’re working with strings. Today, I want to go over some of the basic ones that you’ll use in Snowflake.

The first two that you’ll deal with make the string either upper or lowercase. Yes, that’s right – you’ve probably figured out the names of the functions already. UPPER() and LOWER() are the 2 functions respectively.

Kevin mentions title capitalization (though not by name) and the quick rule depends on which rulebook you’re using. I grew up with MLA, which I summarize as:

  • Don’t capitalize articles (the, a, an), prepositions, or coordinating conjunctions (for, and, nor, but, or, yet, so)
  • Don’t capitalize “to” when it’s an infinitive (to go, to drive, etc.)
  • Don’t capitalize the second part of a hyphenated phrase if it shows up in the dictionary as one word without a hyphen
  • Capitalize everything else

And a quick bit of advice: understanding title capitalization really does make you look more professional, I promise. Unless we’re using different rulebooks, in which case at least one of us is a heretic.

Comments closed

Thoughts on Fabric Data Warehouse

Teo Lachev continues a series on digging into Microsoft Fabric components:

Continuing our Power BI Fabric journey, let’s look at another of its engines that I personally care about – Fabric Warehouse (aka as Synapse Data Warehouse). Most of my real-life projects require integrating data from multiple data sources into a centralized repository (commonly referred to as a data warehouse) that centralizes trusted data and serves it as a source to Power BI and Analysis Services semantic models. Due to the venerable history of relational databases and other benefits, I’ve been relying on relational databases powered by SQL Server to host the data warehouse. This usually entails a compromise between scalability and budget. Therefore, Azure-based projects with low data volumes (up to a few million rows) typically host the warehouse in a cost-effective Azure SQL Database, while large scale projects aim for Synapse SQL Dedicated Pools. And now there is a new option on the horizon – Fabric Warehouse. But where does it fit in?

Teo gives us some real talk on this one, with plenty of ugly.

Comments closed

Generating Random Data in Snowflake

Kevin Wilkie generates some random data:

One of the many things that the business team asks me to do is to create random-ish data. Thankfully, in Snowflake, there are many ways to make this happen. Today, I want to go thru just a few of them.

Perhaps the one that most people are familiar with is making Snowflake create a random number.

Click through for initial coverage of the RANDOM() function, as well as how you can generate data across a uniform distribution over a given range.

Comments closed

Computed Columns in Snowflake

Kevin Wilkie does the math:

Sometimes to make our lives easier, we, as database engineers, can create a table that automatically tells us the answer as we need it – or at least how we tell it we want it. In SQL Server, we create what is called “Computed Columns.”

Read on to see how to create one of these in Snowflake.

Comments closed

Table Clustering and Search Optimization in Snowflake

Kedhar Natekar continues a series on Snowflake performance:

Clustering does not guarantee improved performance on non-clustered columns. 

If you have frequent queries on non-clustered columns and performance is the key irrespective of cost, then opt for a search optimization service over the entire table of specific columns.

It’s similar to enabling indexing on RDBMS databases like Oracle on specific columns.

Read on to see how these work and get a few tips along the way.

Comments closed

Building a Data Warehouse in Microsoft Fabric

Reza Rad continues a video series on Microsoft Fabric:

Microsoft Fabric Data Warehouse is a database system that stores data in OneLake and provides a medium to interact with the database using SQL commands. The Fabric Data Warehouse, which is also called Data Warehouse, or in short, Warehouse, also provides a powerful computing engine behind the scene to account for large volumes of data and support a fast-performing database system. The term Data Warehouse comes from the fact that this is not usually a place to store transactional data for an operational system (for that, you can use Azure SQL Database). A Data Warehouse, in generic Business Intelligence terminology, is a place where you would store the data that needs to be analyzed.

Reza also explains how the warehouse differs from a lakehouse.

Comments closed

The Current Status of the Lakehouse Architecture

Paul Turley is happy:

When I first started attending conference and user group sessions about Lakehouse architecture, I didn’t get it at first, but I do now; and it checks all the boxes. As a Consulting Services Director in a practice with over 200 BI developers and data warehouse engineers, I see first-hand how our customers – large and small – are adopting the Lakehouse for BI, Data science and operational reporting.

Read on for Paul’s thoughts. My main concern with the strategy has always been performance, with the expectation that it’d take a few years for lakehouse systems to be ready for prime time. We’re getting close to that few years (back in 2020, I believe I estimated 2024-2025).

Comments closed