Press "Enter" to skip to content

Category: Performance Tuning

Partitioned Compute with Fabric Dataflows Gen2

Chris Webb continues to test out Fabric Dataflow Gen2 performance:

In the first part of this series I showed how the Concurrency setting in a Fabric Dataflows Gen2 can affect refresh performance when there are multiple queries inside the dataflow. In this post I will show how, with Partitioned Compute, this setting can also affect the performance of a single query within a dataflow.

To test this I created a dataflow with one query, a modified version of the query that I used in this post from earlier this year which returns a table with ten rows and calls a function with a built-in delay of 60 seconds on each row.

This is a preview feature but Chris shows a simple but effective test to demonstrate how this capability works.

Leave a Comment

Filtered Indexes in SQL Server

Erik Darling has a new video:

Now, you just can’t talk about indexing in SQL Server really without talking about filtered indexes. They are a very, very important thing. Conceptually, they are just not that hard to figure out.

It’s an index with a where clause. It only indexes some of the data. It qualifies for the where clause. I don’t know. Like the benefits of that just seem rather apparent to me.

Benjamin Franklin highly encourages you to watch this video, even though filtered indexes are one of the most frustrating things in SQL Server. There are so many cases where I think they should work, and they actually work in approximately a third of those cases.

Leave a Comment

Polymorphic Associations in Postgres

Andrei Lepikhov has multiple types:

Planning such a query efficiently is no easy task — and in my experience, this is confirmed by user reports from the 1C world, since PostgreSQL is currently not rich in LEFT JOIN optimisations. At the same time, the properties of this pattern enable the development of various techniques to improve execution efficiency. I’ve managed to implement several straightforward optimisations of this template. But first, let’s understand what polymorphic references actually are, where they come from, and how common they really are. That’s the gap I’m trying to fill with this post.

Click through for the explanation. This isn’t the easiest problem to solve in the relational world, though I do tend to prefer the subclass/superclass solution, myself.

Leave a Comment

CLUSTER BY in Microsoft Fabric Data Warehouse

Nikola Ilic shows off a relatively new feature:

The first thing every Fabric architect reaches for in this situation is the usual playlist: check the query plan, look at the joins, validate the statistics, maybe scale up the capacity. All worth doing, but none of those things addressed what was actually happening: the warehouse was scanning the entire table for every filtered query, because there was no way to tell it which Parquet files actually contained the rows we cared about.

However, Microsoft shipped data clustering in preview at the end of November 2025, and the entire conversation changed.

In this article, I want to walk you through what data clustering is, how it works under the hood, and most importantly, I’ll show you a real demo on a 100-million-row clickstream table that you can run in your own warehouse. No abstractions, no marketing numbers, but actual T-SQL you can paste.

Some of the notes Nikola mentions remind me of some of the rules around making columnstore indexes work and for much of the same reason. But as Nikola’s demo shows, this is definitely a “You must be this tall to ride the ride” feature, and unless you’re talking about quite large fact tables with (at a minimum) billions of rows of data, the benefit mostly comes from reducing CUs rather than wall clock time improvements.

Leave a Comment

Partitioning and Columnstore Indexes

Erik Darling puts together a great combination for a very large dataset:

 So, today we’re going to talk about partitioning in columnstore because there are important differences between partitioned columnstore tables and partitioned rowstore tables. One of the sort of superpowers that columnstore has is the ability to use metadata about which row groups have which data in them, and it can skip entire segments that do not contain relevant data.

I agree with Erik’s point that you do need around 500 million or so rows before this capability really shines, but if you do pick the right partition key, you get one of those rare and coveted performance improvements from partitioning.

Comments closed

Migration Regret: SQL Server to Postgres Edition

Tim Radney provides an important reminder:

As a data nerd who’s spent the last 25+ years helping organizations keep their databases running smoothly, I’ve had this conversation more times than I can count: “We’re moving to Postgres to save on licensing costs.” It sounds great on paper, open source, no vendor lock-in, and those big SQL Server license fees go away. But lately, I’m hearing a different story from DBAs and architects after the migration is done. They’re calling it Post Regret. That sinking feeling when the promised savings evaporate, performance tanks, and the team realizes they might have been better off staying put (or at least doing a lot more due diligence).

If you’re considering a SQL Server to PostgreSQL migration (or already knee-deep in one), this post is for you. I’ll break down what Post Regret looks like in the real world, why it happens so often, and how to avoid becoming the next cautionary tale. I’ve seen it play out in enough environments to spot the patterns.

Click through for Tim’s tales of woe. Importantly, none of it is a knock on Postgres or a knock on SQL Server. It’s the fact that these are two separate products whose tuning options are very different. You can successfully migrate from one to the other, but to do so, you really need to have a great understanding of both platforms at scale, not just at the tutorial level.

1 Comment

Risks of Using PostgreSQL as a Job Queue

Richard Yen explains why it’s not the best tool for the job:

At small scale, using Postgres as a job queue is totally fine, and I’d even say it’s the right call. Fewer moving parts, one less system to manage, ACID guarantees on your jobs. What’s not to love?

The problem is that “small scale” has a ceiling, and the ceiling is lower than most people expect. When you’ve got thousands of concurrent workers hammering a jobs table with SELECT ... FOR UPDATE SKIP LOCKED, things start to behave in ways that aren’t obvious from the application layer. CPU usage creeps up. Also vacuum sometimes can’t keep up. Finally, in the wait event stats, you start seeing ominous entries like LWLock:MultiXactSLRU stacking up across many backends.

This pattern has tripped up teams more than a few times, and it usually plays out the same way: everything works fine in dev and staging, then goes off a cliff in production once the concurrency gets real. So let’s dig into why this happens, and what the alternatives look like.

Click through for more information. It’s the same on the SQL Server side of the house: once you hit a concurrency threshold, performance drops off of a cliff.

Comments closed

Applying the Pareto Principle to Query Store

Erik Darling channels an Italian economist:

In this video, I dive into an old-school approach to identifying SQL Server performance issues using SP Quickie Store and a novel method inspired by the Pomodoro [“Pareto” – ed] technique. Traditionally, Query Store surfaces queries that consumed a lot of CPU over the last seven days, but often these results are too broad for practical use. To address this, I’ve developed a multi-dimensional scoring system that evaluates queries based on their impact across several key metrics: CPU usage, duration, physical reads, writes, and executions. This approach helps pinpoint the most problematic queries more accurately, even when they run outside of typical working hours or are unparameterized. By sharing these insights, I hope to provide a practical tool for SQL Server administrators looking to optimize their databases without relying solely on modern monitoring tools.

The AI generated summary reminds me that I’ve been working for 25 minutes, so time to take a break.

I like the idea of calculating and calculating & displaying impact scores, as well as breaking it down into core components.

Comments closed

Increasing CPU Capacity or Tuning Queries

John Deardurff explains how to make a choice:

Recently while discussing the Task Execution Model and Thread Scheduling, I was asked the following question, When discussing worker threads, how can we determine whether we should increase CPU capacity or focus on query tuning? This is when our worker threads are under pressure and the instance is becoming exhausted?

In my brain, I thought, that is a great question, and it’s exactly the right way to think about worker thread pressure vs. real CPU starvation, especially when worker threads are getting tight. Let’s write a post.

John has a nice discussion of the trade-offs and signals associated with each approach. One third approach I might add is caching in the application(s), if applicable. This is especially useful if a significant fraction of the queries access static or nearly-static data.

Comments closed

Tracking Resource Utilization in Performance Monitor

Erik Darling has another video tutorial:

In this video, I delve into the resource metrics tabs of the full dashboard from my free SQL Server performance monitoring tool. I explain how these tabs provide more detailed information compared to the lighter version, including CPU utilization, TempDB usage, memory usage, and a selection of perfmon counters. The goal is to help you quickly identify when your server might be experiencing issues by showing real-time data and historical trends that can pinpoint specific performance bottlenecks. I also highlight how the weight stats section helps in understanding wait statistics, making it easier to diagnose problems related to resource contention and query compilation.

Click through for the video.

Comments closed