Press "Enter" to skip to content

Category: Warehousing

IS DISTINCT FROM in Snowflake

Kevin Wilkie remains distinct:

Now, the more fun – “new-ish” – version of the DISTINCT keyword.

Let’s take two values – A and B. Let’s define A = 7 and B = 2.

Snowflake will allow you to ask if A IS DISTINCT FROM B. Thankfully, in this case, it is.

Click through to see how this works. Also note that this syntax is available in SQL Server 2022.

Comments closed

String Casing in Snowflake

Kevin Wilkie is on the case:

When you’re working with a database, it’s very hard to not deal with strings at some point in your journey. There are lots of different functions that you will be working with when you’re working with strings. Today, I want to go over some of the basic ones that you’ll use in Snowflake.

The first two that you’ll deal with make the string either upper or lowercase. Yes, that’s right – you’ve probably figured out the names of the functions already. UPPER() and LOWER() are the 2 functions respectively.

Kevin mentions title capitalization (though not by name) and the quick rule depends on which rulebook you’re using. I grew up with MLA, which I summarize as:

  • Don’t capitalize articles (the, a, an), prepositions, or coordinating conjunctions (for, and, nor, but, or, yet, so)
  • Don’t capitalize “to” when it’s an infinitive (to go, to drive, etc.)
  • Don’t capitalize the second part of a hyphenated phrase if it shows up in the dictionary as one word without a hyphen
  • Capitalize everything else

And a quick bit of advice: understanding title capitalization really does make you look more professional, I promise. Unless we’re using different rulebooks, in which case at least one of us is a heretic.

Comments closed

Thoughts on Fabric Data Warehouse

Teo Lachev continues a series on digging into Microsoft Fabric components:

Continuing our Power BI Fabric journey, let’s look at another of its engines that I personally care about – Fabric Warehouse (aka as Synapse Data Warehouse). Most of my real-life projects require integrating data from multiple data sources into a centralized repository (commonly referred to as a data warehouse) that centralizes trusted data and serves it as a source to Power BI and Analysis Services semantic models. Due to the venerable history of relational databases and other benefits, I’ve been relying on relational databases powered by SQL Server to host the data warehouse. This usually entails a compromise between scalability and budget. Therefore, Azure-based projects with low data volumes (up to a few million rows) typically host the warehouse in a cost-effective Azure SQL Database, while large scale projects aim for Synapse SQL Dedicated Pools. And now there is a new option on the horizon – Fabric Warehouse. But where does it fit in?

Teo gives us some real talk on this one, with plenty of ugly.

Comments closed

Generating Random Data in Snowflake

Kevin Wilkie generates some random data:

One of the many things that the business team asks me to do is to create random-ish data. Thankfully, in Snowflake, there are many ways to make this happen. Today, I want to go thru just a few of them.

Perhaps the one that most people are familiar with is making Snowflake create a random number.

Click through for initial coverage of the RANDOM() function, as well as how you can generate data across a uniform distribution over a given range.

Comments closed

Computed Columns in Snowflake

Kevin Wilkie does the math:

Sometimes to make our lives easier, we, as database engineers, can create a table that automatically tells us the answer as we need it – or at least how we tell it we want it. In SQL Server, we create what is called “Computed Columns.”

Read on to see how to create one of these in Snowflake.

Comments closed

Table Clustering and Search Optimization in Snowflake

Kedhar Natekar continues a series on Snowflake performance:

Clustering does not guarantee improved performance on non-clustered columns. 

If you have frequent queries on non-clustered columns and performance is the key irrespective of cost, then opt for a search optimization service over the entire table of specific columns.

It’s similar to enabling indexing on RDBMS databases like Oracle on specific columns.

Read on to see how these work and get a few tips along the way.

Comments closed

Building a Data Warehouse in Microsoft Fabric

Reza Rad continues a video series on Microsoft Fabric:

Microsoft Fabric Data Warehouse is a database system that stores data in OneLake and provides a medium to interact with the database using SQL commands. The Fabric Data Warehouse, which is also called Data Warehouse, or in short, Warehouse, also provides a powerful computing engine behind the scene to account for large volumes of data and support a fast-performing database system. The term Data Warehouse comes from the fact that this is not usually a place to store transactional data for an operational system (for that, you can use Azure SQL Database). A Data Warehouse, in generic Business Intelligence terminology, is a place where you would store the data that needs to be analyzed.

Reza also explains how the warehouse differs from a lakehouse.

Comments closed

The Current Status of the Lakehouse Architecture

Paul Turley is happy:

When I first started attending conference and user group sessions about Lakehouse architecture, I didn’t get it at first, but I do now; and it checks all the boxes. As a Consulting Services Director in a practice with over 200 BI developers and data warehouse engineers, I see first-hand how our customers – large and small – are adopting the Lakehouse for BI, Data science and operational reporting.

Read on for Paul’s thoughts. My main concern with the strategy has always been performance, with the expectation that it’d take a few years for lakehouse systems to be ready for prime time. We’re getting close to that few years (back in 2020, I believe I estimated 2024-2025).

Comments closed

Recursive Common Table Expressions in Snowflake

Kevin Wilkie is too fancy for simple joins:

Today, I want to talk about that fun edge case when you’re having to join a table to itself in Snowflake. Does it happen often? Not unless your architect just hates you.

Let’s use the normal pieces of data that everyone uses for this kind of thing – employee/manager relationships. We have our employee table that we’ve been working off that we’ll play with for this example.

The syntax is a bit different from T-SQL, but the concept is still the same.

Comments closed

Against Keys in Fact Tables

Marc Lelijveld searches for keys under the lamppost:

Another blog post based on recent client experiences. Last week, I visited a client where we had extensive discussions on data model optimization. As you might know, data modeling in Power BI is one of my favorite topics, so I had an excellent day. It’s also not the first time that I blog about anything data modeling and optimization. If you haven’t read it yet, I recommend reading my previous blog on this topic.

This blog will focus on the need of keys in your tables and primarily your fact tables in your data model. I keep running into data models at customers which are flooded with keys in all tables. For each of them you should ask, do I really need this and could I save it in a different data type for further optimization. In this blog, I will further elaborate on keys in your data model, typical use cases and how these cases can be solved in different manners.

Read the whole thing. The really short version is classic Kimball-style advice: keys for dimensions, not for facts. And in Power BI, removing a unique column from a fact table can speed things up by shrinking the compressed fact table size.

Comments closed