Press "Enter" to skip to content

Curated SQL Posts

DAX UDF Measures vs Calculation Groups for Time Intelligence

Bernat Agullo Rosello compares two capabilities:

Ever since DAX UDFs came out as public preview in September 2025, many DAX developers started wondering how they will compare with calculation groups since both have the centralization of code as one of their main selling points. As pointed out in a recent article by SQLBI they are indeed very different beasts, even though they can be used to achieve very similar outputs.

In short, a calculation group is a model-level object whose items swap one DAX expression for another at evaluation time. Once an item is in the filter context, it applies to every measure being evaluated. A DAX UDF is a smaller object: a named, reusable expression with parameters, callable from any measure but invisible to report users.

Read on to see when calculation groups still make sense and when DAX UDFs are the better choice.

Leave a Comment

Building a Multi-Instance Health Check with Powershell

Garry Bargsley puts it all together:

Over the past four weeks, you’ve built a real foundation. You know that single quotes are literal and double quotes expand variables. You know how to store and use data in variables – strings, integers, booleans, and hashtables. You know how to loop over a collection and do something useful to each item, including handling errors when a server doesn’t cooperate.

Now we put it all together. No new concepts this week, just everything you’ve learned working as a single, practical script that solves a real DBA problem. Think of it as your Jedi trials. The training is done. Time to use the Force.

Click through for the script. It’s fairly simple in terms of what it’s actually doing, but gives you a good idea of the types of things you can do.

Leave a Comment

Adding Patterns to ggplot2 Plots

Zhenguo Zhang adds some patterns:

Adding patterns to plots is a great way to improve accessibility (making plots colorblind-friendly) and to add an extra dimension of information. The ggpattern package provides a rich set of tools to achieve this in ggplot2.

I’m personally not the biggest fan of patterns. I see them as a point of necessity when dealing with grayscale circumstances, such as printing out a chart in an academic journal. But it’s very easy to overdo patterns and end up making a mess of the visual.

But one side note about color vision deficiency and plots: make sure that your plots are monochrome-friendly because somebody probably will try to print out your chart or view it on a grayscale-only device. Or might actually be monochromatic.

Leave a Comment

The Disappointment of Parameter-Sensitive Plan Optimization

Hugo Kornelis is back with another video:

Bad hair day? Try having an almost-no-hair month!

Jokes aside. It has been almost six weeks since my last video blog. Not really the schedule I had planned. But I believe I have good reasons.

Anyway, I do have a new video ready now. As promised in my last video, I now cover Parameter Sensitive Plan Optimization (PSPO), a new feature, introduced in SQL Server 2022, that is supposed to alleviate the pain of bad parameter sniffing.

Read on for the promise and letdown of PSPO.

Leave a Comment

An Overview of Power BI File Formats

Reza Rad covers a multitude of formats:

Power BI files come in multiple formats. If you have been working with Power BI Desktop for a while, you probably know the PBIX format well. But there are other formats too: PBIT, PBIP, and PBIR. The last two are more recent additions, and there is also a language called TMDL (Tabular Model Definition Language) that comes with its own file type.

In this blog post, I am going to explain the differences between all of these formats, why each one exists, and which one you should use for your scenarios. I will also show you, through a practical demo, what PBIP looks like and how you can use it to make your development process significantly better.

Click through for the video and article.

Leave a Comment

Foreign Tables and Materialized Views in PostgreSQL

Richard Yen provides a write-back system for analysts:

I recently wrote a post about WAL log shipping and how a standby built on log shipping is a great way to give data analysts production data without putting the primary at risk. Having access to the production data in this way is great, but it’s read-only. How can we create views of this data for better analytics work? I want to make the case today that Foreign Data Wrappers and Materialized Views can make a great solution – not only in accessing production Postgres data, but also working with other data sources.

Click through for an architectural discussion of how they can work together.

Leave a Comment

An FAQ-Based Introduction to Data Factories in Azure

Koen Verbeeck answers some questions:

Is Microsoft Fabric replacing Azure Data Factory?

Officially, no. Or maybe not yet. At the time of writing, ADF still remains a separate product but it’s noticeable that more new features are added to Fabric than to ADF. There are still many customers using ADF, so Microsoft might keep the service around for a while. There’s also still a bit of a feature gap between the two services, but this becomes more narrower each month. Microsoft is offering migration scenarios from ADF to Fabric.

I picked this question because of how much the concept annoys me. There are three separate Data Factory code bases in Azure with overlapping but not matching functionality (which is how you can tell it’s multiple code bases and not just one code base reskinned). This can lead to a scenario where Person A says, “Oh, do this thing in Data Factory.” Person B then says, “But I can’t do that in Data Factory.” Person A’s response: “Oh, that’s weird, because I can do it in Data Factory.” This leads to necessary but somewhat absurd clarifications around how you need to use Microsoft Fabric Data Factory, not Azure Data Factory because, even though Microsoft Fabric Data Factory is hosted in Azure, it’s a different product.

And don’t get me started on the wide variety of KQL platforms, all of which are subtly different.

Leave a Comment

Concurrent Evaluation with Microsoft Fabric Dataflows Gen2

Chris Webb runs multiple jobs at once:

Did you know that if your Fabric Dataflows Gen2 contains several queries then you can control how many of them are evaluated in parallel when your dataflow refreshes? In this series I’ll look at how how you can do this and how it may result in better performance – at least in some cases.

Let’s start with the basics. I created a Dataflow Gen2 with ten queries which each returned a table of one row and one column after one minute. I used the #table function to generate the table without connecting to a data source, code from this post to add the delay and the trick in this post to make sure the delay was only applied when the dataflow refreshed. The output of each query was loaded to a Fabric Warehouse.

Click through for a demonstration.

Leave a Comment

Estimating Probabilities from Unevenly Collected Data

Nina Zumel answers an important question:

In this article, we look at the problem of estimating and comparing probabilities about a population of subjects from unevenly collected observations. Some examples might include:

  • The perceived quality of a movie (how often is a movie positively reviewed) when some movies have far more reviews than others.
  • The effectiveness of various ad campaigns, when some compaigns have had more exposure than others.
  • The efficacy of a certain medical procedure by hospital, when some hospitals have had more cases than others.

For our specific task, we’ll try to estimate the “innate” batting ability (the probability of making a hit when at bat) of major league baseball players in 2023. For the sake of this article, we will take this single season of data as everything that we know about these players and their batting statistics.

It’s an interesting problem because she’s looking at 2023 data as an estimation of the player’s entire career, with the goal of estimating how a player will perform overall given a fairly reasonably sized sample of information collected from one relatively short period of that player’s career. H/T John Mount.

Leave a Comment

A Challenge of Visualizing Game Statistics

Kieran Healy scratches his head:

I just finished driving a very long way up the side of the country, so I’m kind of tired. But even allowing for that, boy, this way of representing things really is quite confusing. Not being an Apple Sports user I had to look at it for a bit to understand what was happening. But, now that it has given me a headache, I can kind of see why whoever designed this ended up in the undoubtedly bad place they did.

Before I get to why I have some sympathy for the designer, why did I find this representation of these numbers so disorienting? It’s not just just because I’ve been driving for nine hours. John is right to call the picture a “Zero Sum” representation. The design strongly suggests to the viewer that, within each row, we’re looking at each team’s share of a total. Each pair of black and blue lines seem to be vying for control of their whole row, with the longest line being the “winner” in each case.

Click through for the challenge, as well as a trio of attempts to improve the results. The tornado chart at the end is probably what I’d go with if I needed to include all of this on a single chart. H/T R-Bloggers.

Leave a Comment