Press "Enter" to skip to content

Category: Warehousing

Handling Source System Deletions in a Warehouse

Rayis Imayev deletes some rows:

When something important disappears, it’s natural to start asking questions and looking for answers, especially when that missing piece has had a significant impact on your life.

Similarly, when data that used to exist in your sourcing system suddenly vanishes without any trace, you’re likely to react in a similar way. You might find yourself reaching out to higher authorities to understand why the existing data management system design allowed this to happen. Your colleagues might wonder if better ways to handle such data-related issues exist. Ultimately, you’ll embark on a quest to question yourself about what could have been done differently to avoid the complete loss of that crucial data.

Kimball-style data warehousing already has the idea of type-2 slowly changing dimensions, which allow you to track the deletion of dimensional data by assigning an end date to the row and not inserting a new record with the next start date. It’s a little harder to deal with fact data deletions in that way, though, as there historically is no concept of slowly changing facts.

Read on for some thoughts on the topic from Rayis.

Comments closed

Workaround for Primary Keys in Fabric Data Warehouses

Gilbert Quevauvilliers needs a key:

When I started looking into using the data warehouses feature in Fabric, I did see that there were limitations on Primary Key columns.

Below is my blog post on how I still use keys in my data warehouse, instead of using GUID’s which to me are long and hard to use.

In my example I am going to create a simple data warehouse which is going to consist of two-dimension tables (Date and Country) and a fact table with the Sales amounts.

This seems sub-optimal, though at least Gilbert shows us a workaround.

Comments closed

Microsoft Fabric Data Warehouse in a Database Project

Kevin Chant creates a database project:

In this post I want to cover how you can share a Microsoft Fabric Data Warehouse Database Project with the new target platform.

Which is now possible thanks to the latest Azure Data Studio Insiders update. You can view the ‘Add projects support for Fabric DW‘ pull request in the public azuredatastudio GitHub repository.

Kevin takes us through creating the database project in Azure Data Studio and then using Azure DevOps or Azure Data Studio to deploy it back out.

Comments closed

Creating a Function in Snowflake

Kevin Wilkie creates a function:

Sometimes, you’ll need to create functions that do a particular thing. They’re not always pretty. Some of them look like they should have been thrown out with the bathwater.

Unlike SQL Server, user-defined functions in Snowflake can be done in several different languages. They can be done in Java, JavaScript, Python, Scala, or plain-old SQL.

Which means that Postgres users will be a bit more comfortable here than SQL Server users.

Comments closed

Data Type Conversions and Snowflake Performance

Kevin Wilkie is implicit in this whole thing:

One of the ways we can get better at speed is to attempt several slightly different ways that can get you (hopefully) the same data. Some tables work better with one query while some work better with another query.

Let’s work through a scenario in Snowflake and we’ll see which one is faster under “normal” conditions.

Click through for a few query examples and how they end up performing.

Comments closed

Visualizing Snowflake Geospatial Data with Power BI

Rebecca O’Connor builds a map:

Power BI can leverage Geospatial data from snowflake with my favourite map visual – Iconmap – https://www.icon-map.com/ . Icon map can render points, polygons and linestrings using ‘Well Known Text’ format (WKT).

Snowflake supports converting geospatial datatypes to WKT. Not only this, Snowflake has the capabilities to perform the Engineering and analytical needs for Geospatial analysis without using any other tool. And the results can be visualised in a variety of medias such as Tableau, Hex, Carto or even a Custom built Streamlit application. I have written a Streamlit blog on this very recently.

Click through for information on how to get the data shaped in a way that Power BI likes.

Comments closed

The Lakehouse is (Still) Not Enough

Nikola Ilic needs more than a lakehouse:

In the previous parts of the Data Modeling for mere mortals series, we examined traditional approaches to data modeling, with focus on dimensional modeling and Star schema importance for business intelligence scenarios. Now, it’s time to introduce the concept of the modern data platform.

As usual, let’s take a more tool-agnostic approach and learn about some of the key characteristics of the modern data estate. Please, don’t mind if I use some of the latest buzzwords related to this topic, but I promise to reduce their usage as much as possible. 

Lakehouses are getting closer to being good enough, but the performance needs to be there, especially if you eventually have virtual data warehouses sitting on top of lakehouse data to deal with the need for structured fact-dimensional data for reporting tools.

Comments closed