Press "Enter" to skip to content

Curated SQL Posts

Benchmarking Kafka

Jack Vanlightly continues a series on Dimster. First up is a benchmark of consumer groups versus share groups:

In this first share group benchmarking post, we’re going to use share groups as they are not intended to be used, but for a good reason. Share groups allow you to move past partitions as the unit of parallelism by allowing multiple consumers to read from the same partition, using message queue semantics. We’ll run those kinds of tests in the next post. In this post I just want to understand if the mechanics of how share groups work add any additional overhead compared to consumer groups. So we’ll use share groups as if they were consumer groups (by capping consumer count to partition count).

Objective: Use synthetic tests to measure the overhead of share groups compared to consumer groups in identical conditions.

After that, Jack simulates processing time:

In this post we’re going to simulate processing time in the consumers to make these benchmarks more realistic and show the utility of share groups (namely the ability to parallelize processing beyond the partition count).

We’ll see how the following two configurations play an important role in parallelizing consumption with share groups:

  • max.poll.records (consumer config)
  • group.share.partition.max.record.locks (broker-side config)

And there’s one more post in the series so far:

In the last post we used simulated consumer processing time to reveal how important it is to set an appropriate value for max.poll.records to ensure the consumer parallelism that we expect. With a uniform distribution of messages over partitions, the rule of thumb was a value somewhat lower than:

group.share.partition.max.record.locks / number of consumers per partition

But there’s more to parallel consumption than max.poll.records. The size of producer batches also plays a role when using the default share.acquire.mode (batch_optimized).

Stay tuned for the next post in the series.

Leave a Comment

Five Locking Behaviors to Watch for in PostgreSQL

Shinya Kato has a list:

PostgreSQL uses MVCC (Multi-Version Concurrency Control) for concurrency control: reads never block writes, and writes never block reads.

Its locking system has 8 table-level lock modes and 4 row-level lock modes, and the conflict tables in the documentation tell you exactly which lock modes conflict with which.

In practice, though, once you actually operate PostgreSQL, locks end up conflicting in places you never expected. Queries take far longer than anticipated, and in the worst case you end up with an outage.

Click through for five notes around locking that may not be immediately apparent.

Leave a Comment

PSBlitz 6.1.0 Released

Vlad Drumea has a new version of PSBlitz:

For anyone not familiar with PSBlitz: it’s a PowerShell-based tool that outputs SQL Server health and performance diagnostics data to either Excel or HTML, and saves execution plans and deadlock graphs as .sqlplan and .xdl files.

If you’re familiar with Oracle, this is pretty much my attempt of a SQL Server equivalent to Oracle’s AWR report.

Feel free to check out the blog post announcing PSBlitz 4 years ago, as well as the 3 year anniversary post.

Read on to see what Vlad has added.

Leave a Comment

Storage of Memory-Optimized Columnstore Indexes

Hugo Kornelis joins a pair of technologies:

Time for the next part in my series on storage structures. The previous parts covered on-disk rowstorecolumnstore indexes, and memory-optimized storage. In this part, I will look at the combination of the latter two: memory-optimized columnstore indexes.

Memory-optimized columnstore indexes were introduced in SQL Server 2016. I’ve seen some slick Microsoft marketing sessions in that time that were big on “real-time operational analytics”. A new trend where analytical processing would no longer be done on a stale copy of the data in a separate data warehouse, but directly on the OLTP database. Reports would always be fully current, there would be no more need for an ETL pipeline, and due to the combination of memory-optimized for OLTP workloads and columnstore for analytical processing, everything would always be fast. In theory.

Yeah, this one kinda fizzled out quickly. It was in line with the HTAP craze from about the same time period. And Hugo shows in this post part of why very few companies ever adopted it.

Leave a Comment

Creating a Private Link in Microsoft Fabric

Gilbert Quevauvilliers locks down a Fabric environment:

In this blog post, I am going to walk through how I created a Private Link in Microsoft Fabric so that access routes over Microsoft’s private backbone network instead of the public internet.

NOTE: It is a bit of a longer blog post as there are quite a few things that need to be configured.

NOTE II: In the next blog post I will show you how to connect from a Public Workspace to a Private Link Workspace using a Managed Private Endpoint

I wanted to document the end-to-end process because there are a few moving parts across Microsoft Fabric and the Azure Portal. The key items I needed to configure were the Fabric tenant setting, the Fabric private link service, the virtual network, a test virtual machine, the private endpoint, and finally DNS testing from inside the virtual network.

There’s a lot more here than ticking a couple of checkboxes or selecting a few radio button options.

Leave a Comment

SQL Server Permissions and Privilege Escalation

Vlad Drumea explains how powerful a few specific roles are:

Privilege escalation is a method in which an attacker gains unauthorized privileged access into a system.
This is done by exploiting misconfigurations, design flaws or unpatched vulnerabilities.

The most straightforward example in SQL Server land being someone “promoting themselves” from members of the public fixed server role, to members of the sysadmin role.

Read on for these three roles and what you can do with them.

Leave a Comment

Full-Text Indexes and SQL Server 2025

Rich Benner rebuilds indexes:

The Full-Text Engine manages full-text indexes. The engine splits your text columns into individual terms and builds an inverted index, mapping each term back to the rows in which it appears. Unlike a standard B-tree index, the structure lives outside the normal index internals and is maintained asynchronously via a background process called a crawl. Effectively, each word in your text string ends up indexed, rather than the string as a whole. This makes certain types of searches much more efficient (you need to use search terms like CONTAINS() to utilize full text indexes).

If you have to search strings like this then full-text indexes can be very effective. String searching isn’t great in SQL Server, but this is definitely a tool in your belt if you have a requirement that makes it useful.

Because of some changes to the way full-text indexing works in SQL Server 2025, there is a post-upgrade maintenance task you’ll have to perform.

Leave a Comment

Explaining PostgreSQL Large Record Storage with TOAST

Radim Marek looks off-page:

In earlier posts in this series we established that every heap tuple lives inside a strict 8KB page. Everything else is built on top of that hard limit: MVCCHOT updates, and indexes that point at (page, line_pointer). And yet this still works:

CREATE TABLE docs (id int PRIMARY KEY, body jsonb);
INSERT INTO docs VALUES (1, (SELECT jsonb_agg(g) FROM generate_series(1, 100000) g));

That body value is somewhere north of half a megabyte. The heap page is still 8KB. Both statements are true at the same time, and the mechanism that makes them coexist is TOASTThe Oversized-Attribute Storage Technique.

Read on to see how TOAST works, when it kicks in, and some of the consequences of this solution.

Leave a Comment

DAX UDF Measures vs Calculation Groups for Time Intelligence

Bernat Agullo Rosello compares two capabilities:

Ever since DAX UDFs came out as public preview in September 2025, many DAX developers started wondering how they will compare with calculation groups since both have the centralization of code as one of their main selling points. As pointed out in a recent article by SQLBI they are indeed very different beasts, even though they can be used to achieve very similar outputs.

In short, a calculation group is a model-level object whose items swap one DAX expression for another at evaluation time. Once an item is in the filter context, it applies to every measure being evaluated. A DAX UDF is a smaller object: a named, reusable expression with parameters, callable from any measure but invisible to report users.

Read on to see when calculation groups still make sense and when DAX UDFs are the better choice.

Leave a Comment

Building a Multi-Instance Health Check with Powershell

Garry Bargsley puts it all together:

Over the past four weeks, you’ve built a real foundation. You know that single quotes are literal and double quotes expand variables. You know how to store and use data in variables – strings, integers, booleans, and hashtables. You know how to loop over a collection and do something useful to each item, including handling errors when a server doesn’t cooperate.

Now we put it all together. No new concepts this week, just everything you’ve learned working as a single, practical script that solves a real DBA problem. Think of it as your Jedi trials. The training is done. Time to use the Force.

Click through for the script. It’s fairly simple in terms of what it’s actually doing, but gives you a good idea of the types of things you can do.

Leave a Comment