Press "Enter" to skip to content

Month: July 2024

An Overview of Slowly Changing Dimensions

Reza Rad talks slowly changing dimensions:

If you want to use Power BI, Microsoft Fabric, or any other data analytics tools, one of the key concepts to understand when working with a data warehouse system is the SCD (Slowly Changing Dimension). I will do this in a series of at least two articles. The first one (this one) will be on the concept of what SCD is, its meaning, and its different types. Then, the next one will discuss how to implement SCD types (such as Type 2) using Microsoft Fabric and Power BI.

Reza focuses on SCD types 0-4 but does briefly touch on types 5-7 (of which, I’d never heard of SCD type 7).

Comments closed

Troubleshooting Transactional Replication Latency in SQL Server

Bobirmirzo Arslanov has a two-parter around troubleshooting transactional replication latency issues. Part 1 looks at architecture and some of the distribution tables:

Before you dive into solving any issue, you need to fully understand the type of environment you have as there might have been changes you are unaware of. An easy way to do that is to run script  SQLServer/Script Replication Topology at master · sqlserver-parikh/SQLServer (github.com) which gives output like below.

Part 2 digs in deeper:

If wait time is high compared to CPU time, check wait type and troubleshoot accordingly. For example, on the above example we faced MEMORY_ALLOCATION_EXT wait_type but duration is 0. So, we are not waiting.

If CPU time is higher, this means log thread is running but latency is being observed because you have high load.

Check out both posts. I’d really like to see a similar article for merge replication, as it’s a lot harder to troubleshoot.

Comments closed

Unit Testing a Database

Olivier Van Steenlandt builds some tests:

In the past few years, I learned much about collaborative data warehouse development and deployment automatization by using Database Projects (SSDT) and Azure DevOps (and other tools).

I had my fair share of learning curves, making mistakes, and having great learning opportunities. Lately, I started my next journey to learn about Unit Testing for data warehousing/database development.

In this data cookbook (blog post series), we will discover the wonderful world and different flavors of unit testing from a data perspective. In the coming weeks/months, new data recipes (blog posts) will be released bi-weekly.

This first post provides an overview of the topic and includes links to three tools, though SQL Test is an implementation of tSQLt. Of the three, Visual Studio tests are the best of the bunch, though they’re more integration tests than unit tests.

Comments closed

Script Tasks in Azure DevOps Pipelines

Justin Bird builds a script task:

There are several dedicated script tasks that can be used for different purposes, some of the tasks have shortcut tasks which require much less input and are arguably easier to read. In the examples below, I have included the required inputs needed to run the scripts but in most cases, other optional inputs are available.

This looks pretty similar to the way you’d do it in GitHub Actions workflows as well—no surprise there, considering that the same group of people who built pipelines in ADO handled Actions in GitHub.

Comments closed

One Problem with Scrollbars in Power BI

Chris Webb focuses on the performance aspect of scrollbars on tables:

Concluding my series of blog posts on seemingly harmless things you can do in a Power BI report that can lead to performance problems and the “This visual has exceeded the available resources” error (see also “Calculate(), Filter() and DAX memory usage” and “DAX measures that never return blank“), in this post I’ll show how table visuals with vertical scrollbars that potentially show thousands of rows can be a Bad Thing.

I’d also note the aesthetic problem: the intent of a dashboard is to be glanceable, meaning that a user can gain sufficient understanding of what is happening without needing to click, drag, sort, filter, or otherwise manipulate the dashboard. This means, if you are intending to create a dashboard (versus a report), vertical scrollbars form a second sort of issue: you’re obscuring data that you consider important enough to show to the end user.

Comments closed

Statistics TIME and IO Analysis and Statistics Reporter

Rod Edwards does a bit of analysis:

Again, this is another one of those little bits of functionality that has been present since year dot. Unfortunately, it seems to be forgotten by many in the great game of query investigation.

Being able to see the IO patterns of a query that we’re running to help us see where the heavy lifting is occurring is really useful in allowing us to get the bottom of problems quickly, or see the benefits of our tuning efforts (or otherwise!).

One neat thing I did not know about before was a product called Statistics Reporter, an extension for SQL Server Management Studio that parses the results from time and I/O statistics results. I like Richie Rump’s Statistics Parser website, but the idea of having something built-in is pretty nice.

Comments closed

Variable Evaluation Time in DAX

Marco Russo and Alberto Ferrari evaluate some variables:

Variables are an important element of DAX to improve readability and performance. Variables are created through the VAR keyword, which can be repeated multiple times for multiple variables, followed by RETURN, the keyword that defines the result of the expression.

Read on for several examples of how you can use variables, and even if it’s possible for the engine to ignore variable declarations if the variable in question never sees use.

Comments closed

Tips to Identify Poorly-Performing Code

Rich Benner shares a few tricks:

So, you have a database or block of code. You’ve been told to “make it fast” but you’re not sure where to start. I’ve got you. We’re going to create a process to follow to ensure we can tune effectively and prove that we’ve made things faster. It’s all about having a structure when performance tuning.

Read on for Rich’s recommendations on how to identify poorly-performing code.

Comments closed

Keyset Pagination and Descending Order

Laurenz Albe digs into keyset pagination:

Keyset pagination is the most performant way to retrieve a large result set page by page. However, the neat trick with composite type comparison doesn’t always work. This article explains why and how you can work around that shortcoming.

All of the examples are for Postgres, though in my read-through of the post, I’d say it applies almost as well to SQL Server.

Comments closed