Warehousing – Page 9

Counting Months between Dates

Published 2023-08-09 by Kevin Feasel

Kevin Wilkie counts the months before his release:

Figuring out the number of years or months between 2 dates shouldn’t be hard. For some reason though, for a lot of database systems it is.

Kevin gives us answers in SQL Server and Snowflake, including a bonus bit of code to correct a problem that he spotted in the first bit of code. Date math is hard.

Comments closed

Visualizing Snowflake Geospatial Data with Power BI

Published 2023-08-07 by Kevin Feasel

Rebecca O’Connor builds a map:

Power BI can leverage Geospatial data from snowflake with my favourite map visual – Iconmap – https://www.icon-map.com/ . Icon map can render points, polygons and linestrings using ‘Well Known Text’ format (WKT).

Snowflake supports converting geospatial datatypes to WKT. Not only this, Snowflake has the capabilities to perform the Engineering and analytical needs for Geospatial analysis without using any other tool. And the results can be visualised in a variety of medias such as Tableau, Hex, Carto or even a Custom built Streamlit application. I have written a Streamlit blog on this very recently.

Click through for information on how to get the data shaped in a way that Power BI likes.

Comments closed

Finding the Min/Max Value of a Column in Snowflake

Published 2023-08-02 by Kevin Feasel

Kevin Wilkie min-maxes his way through life:

By the title of this post, you’re thinking “Ho hum. Sherpa will tell us about the stupid MIN and MAX functions in Snowflake and how great they are.”

Well, they are great. Let’s not kid ourselves about that one.

But Kevin pulls the rug out from under us, as this isn’t a post about MIN() and MAX().

Comments closed

The Lakehouse is (Still) Not Enough

Published 2023-07-26 by Kevin Feasel

Nikola Ilic needs more than a lakehouse:

In the previous parts of the Data Modeling for mere mortals series, we examined traditional approaches to data modeling, with focus on dimensional modeling and Star schema importance for business intelligence scenarios. Now, it’s time to introduce the concept of the modern data platform.

As usual, let’s take a more tool-agnostic approach and learn about some of the key characteristics of the modern data estate. Please, don’t mind if I use some of the latest buzzwords related to this topic, but I promise to reduce their usage as much as possible.

Lakehouses are getting closer to being good enough, but the performance needs to be there, especially if you eventually have virtual data warehouses sitting on top of lakehouse data to deal with the need for structured fact-dimensional data for reporting tools.

Comments closed

The Basics of Fact-Dimensional Modeling

Published 2023-07-21 by Kevin Feasel

Nikola Ilic gives us a primer on Kimball-style fact and dimensional modeling:

Before we come up to explain why dimensional modelling is named like that – dimensional, let’s first take a brief tour through some history lessons. In 1996, a man called Ralph Kimball published a book “The Data Warehouse Toolkit”, which is still considered a dimensional modelling “Bible”. In his book, Kimball introduced a completely new approach to modelling data for analytical workloads, the so-called “bottom-up” approach. The focus is on identifying key business processes within the organization and modelling these first, before introducing additional business processes.

This is a really good overview of the topic, though I’m saddened that “dimensional bus matrix” didn’t make the cut of things to discuss. Mostly because I like the name “dimensional bus matrix.”

Comments closed

End of Month and Time Slice Functions in Snowflake

Published 2023-07-19 by Kevin Feasel

Kevin Wilkie is waiting for the calendar to change:

In SQL Server, we’re used to finding the end of the month via a few different methods. We can always use the DateAdd and DateDiff functions to get our data – which sometimes takes a bit of work – or we can use the EOMonth function.

Read on to see what tools are available for Snowflake users.

Comments closed

IS DISTINCT FROM in Snowflake

Published 2023-07-13 by Kevin Feasel

Kevin Wilkie remains distinct:

Now, the more fun – “new-ish” – version of the DISTINCT keyword.

Let’s take two values – A and B. Let’s define A = 7 and B = 2.

Snowflake will allow you to ask if A IS DISTINCT FROM B. Thankfully, in this case, it is.

Click through to see how this works. Also note that this syntax is available in SQL Server 2022.

Comments closed

String Casing in Snowflake

Published 2023-07-06 by Kevin Feasel

Kevin Wilkie is on the case:

When you’re working with a database, it’s very hard to not deal with strings at some point in your journey. There are lots of different functions that you will be working with when you’re working with strings. Today, I want to go over some of the basic ones that you’ll use in Snowflake.

The first two that you’ll deal with make the string either upper or lowercase. Yes, that’s right – you’ve probably figured out the names of the functions already. UPPER() and LOWER() are the 2 functions respectively.

Kevin mentions title capitalization (though not by name) and the quick rule depends on which rulebook you’re using. I grew up with MLA, which I summarize as:

Don’t capitalize articles (the, a, an), prepositions, or coordinating conjunctions (for, and, nor, but, or, yet, so)
Don’t capitalize “to” when it’s an infinitive (to go, to drive, etc.)
Don’t capitalize the second part of a hyphenated phrase if it shows up in the dictionary as one word without a hyphen
Capitalize everything else

And a quick bit of advice: understanding title capitalization really does make you look more professional, I promise. Unless we’re using different rulebooks, in which case at least one of us is a heretic.

Comments closed

Thoughts on Fabric Data Warehouse

Published 2023-07-06 by Kevin Feasel

Teo Lachev continues a series on digging into Microsoft Fabric components:

Continuing our Power BI Fabric journey, let’s look at another of its engines that I personally care about – Fabric Warehouse (aka as Synapse Data Warehouse). Most of my real-life projects require integrating data from multiple data sources into a centralized repository (commonly referred to as a data warehouse) that centralizes trusted data and serves it as a source to Power BI and Analysis Services semantic models. Due to the venerable history of relational databases and other benefits, I’ve been relying on relational databases powered by SQL Server to host the data warehouse. This usually entails a compromise between scalability and budget. Therefore, Azure-based projects with low data volumes (up to a few million rows) typically host the warehouse in a cost-effective Azure SQL Database, while large scale projects aim for Synapse SQL Dedicated Pools. And now there is a new option on the horizon – Fabric Warehouse. But where does it fit in?

Teo gives us some real talk on this one, with plenty of ugly.

Comments closed

Generating Random Data in Snowflake

Published 2023-06-28 by Kevin Feasel

Kevin Wilkie generates some random data:

One of the many things that the business team asks me to do is to create random-ish data. Thankfully, in Snowflake, there are many ways to make this happen. Today, I want to go thru just a few of them.

Perhaps the one that most people are familiar with is making Snowflake create a random number.

Click through for initial coverage of the RANDOM() function, as well as how you can generate data across a uniform distribution over a given range.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Category: Warehousing