Press "Enter" to skip to content

Curated SQL Posts

A Look at Tabular Foundation Models

Michael Mayer tries out a neural network model:

Tabular data has had a comfortable life for years. Gradient boosting showed up, got very good at its job, and then quietly became the default answer to almost everything with rows and columns.

In very recent years, a new player has arrived: the tabular foundation model or prior fitted neural network, and suddenly tabular data is sounding a lot less sleepy…

I’ve done a bit with TabPFN and come away fairly impressed. I’ll have to give this a go as well. There are definite limitations to data sizes before things fall over, but for moderate sizes (50k or fewer rows), TabPFN at least worked pretty well.

Leave a Comment

Dimensional Testing in Kafka

Jack Vanlightly announces a new tool:

Most of my career in distributed systems has been as a tester, performance engineer and formal verification specialist. I’ve written performance benchmarking tools in the past, for RabbitMQ and Apache Pulsar but in recent years I’ve used OpenMessagingBenchmark (OMB) to run benchmarks against Apache Kafka and other messaging systems. But OMB is hard to deploy and has several limitations compared to more sophisticated benchmarking systems I’ve developed in the past. With Claude becoming so much better since Christmas I decided to write a Kafka-centric performance benchmarking tool, with a lot of inspiration from OMB. I took the bits I like about OMB and the things I like about the tooling I’ve built in the past, to make a performance testing tool for testing Apache Kafka.

Click through for an overview of the tool and how it works.

Leave a Comment

ORDER BY COALESCE() in PostgreSQL (and SQL Server)

Laetitia Avrot digs in:

I was reading Markus Winand’s latest post on ORDER BY history last week. If you haven’t read it yet, go read it. Markus is one of the best writers on SQL standards, and this post is no exception.

One line stopped me cold. The compatibility table for “expressions on selected columns.” Postgres: partial. PostgreSQL 18: still partial.

That itch needed scratching.

The basic version of this is that you cannot use the alias of a computed expression in a function in the ORDER BY clause in either PostgreSQL or SQL Server. In other words, the following fails:

SELECT a + b AS x
FROM t
ORDER BY COALESCE(x, 0);

Read on for an explanation of why this is the case in PostgreSQL. I’d imagine that the reasoning is about the same for SQL Server.

Leave a Comment

Adding Foreign Keys and Deadlocks

Michael J. Swart explains a challenge in adding a foreign key to an existing table:

Schema modification locks (SCH-M) are taken by DDL (Data Definition Language) statements like CREATE/ALTER/DROP.
Schema stability locks (SCH-S) are taken by DML (Data Manipulation Language) statements like INSERT/UPDATE/DELETE.

Those two types of locks are incompatible. Meaning, I can’t get a SCH-S lock on some table if you’ve already got a SCH-M lock on it (and vice versa).
Paul Randal describes the SCH-M lock as a super-table-X lock. It makes sense to me, if I’m half way through querying a table, I don’t want its definition to change.

Such a pessimistic lock can be awkward for a busy system. The SCH-M can cause a lot of blocking. For example, creating (and dropping) foreign keys requires a SCH-M lock not only on the parent table, but also on the referenced table which leads to trouble.

Click through for a demonstration of the problem. Michael also has some guidance on how to minimize the issue. I’d note the degenerative form of this guidance: understand your data model up-front and apply foreign key constraints at table creation time. That’s not always possible, sure, so when you can’t do that, Michael has some good advice.

Leave a Comment

Explaining the Fabric Ontology

James Serra takes us through a big word:

For years, most data conversations have started with tables. We ask where the data lives, what columns are available, how the joins work, and whether the data is in a warehouse, lakehouse, semantic model, or some other system. That makes sense, because tables are how most of us have worked with data for decades. But tables are not how the business thinks.

A business thinks in terms of customers, products, orders, shipments, assets, flights, runways, employees, policies, and actions. The problem is not usually a lack of data. The problem is a lack of shared meaning. Organizations often have the same business concept represented multiple ways across teams and systems, creating what I would call semantic drift. Sales may define a customer one way. Finance may define it another way. Operations may have yet another version in a different system with different keys, names, and assumptions. That is exactly where Fabric Ontology becomes important. It is designed to close the gap between physical data structures and business meaning.

Microsoft is a bit late to the ontology game and their current concept of an ontology shows. I can understand where they’re going but they still have a ways to go.

Leave a Comment

Working with PIVOT and UNPIVOT in SQL Server

Ed Pollack explains a pair of operators:

There are few operators in T-SQL that cause developers to scramble for documentation more than PIVOT and UNPIVOT. Beyond documentation, transforming columns into rows (and vice-versa) can often be confusing and frustrating for those of us tasked with reformatting data for use by an application.

This article walks through PIVOT and UNPIVOT, providing examples of simple use cases for both – as well as some more complex scenarios we can run into in real-world data. These can be extraordinarily useful ways to reformat data efficiently and quickly with less code than the alternatives. So, there is no need to fear them again!

Click through for Ed’s article. I definitely don’t fear either PIVOT or UNPIVOT and they can be quite useful. But if you locked me in a room and I couldn’t leave until I came up with the proper syntax for both from memory, well, I’d be in that room for a while.

Leave a Comment

The Pain of NULL

Louis Davidson explains the unknown:

There is no simpler topic in relational comparisons than three valued logic. I am being mostly facetious about this, but in reality, it seems so simple that people don’t think about how a NULL works, and make mistakes all of the time. I was reading a post about this the other day on LinkedIn (which by no means could one ever find again!) where one of the comments chastised the author of the post for not understanding “the fundamentals” of relational theory. The original poster wasn’t completely right (and my post may not be completely perfect either, though I will back most of what I write with code.)

In this post I want to point out a few of the key basics that one really should understand.

Click through for a primer on what NULL means and doesn’t mean. And by the time you’re done, I’d like to interest you in the power of 6th normal form, where you can effectively banish NULL into the abyss (at least until you join the bits back together).

Leave a Comment

Finding Bad Queries with sp_QuickieCache

Erik Darling is on a hunt for bad queries:

That’s the funny part. Alright. Cool. With that out of the way, let’s look at this new store[d] procedure. Uh, I think I have to go to Management Studio. Yeah, I remember what that looks like. Alright. Cool. So, uh, this is, this is it. SPQuickieCache. Pay no attention to the terrible red squiggly underlines.

I think Erik made his AI business partner angry, as it didn’t strip out any of the filler words from the transcript. But this does look like a neat stored procedure.

1 Comment

CLUSTER BY in Microsoft Fabric Data Warehouse

Nikola Ilic shows off a relatively new feature:

The first thing every Fabric architect reaches for in this situation is the usual playlist: check the query plan, look at the joins, validate the statistics, maybe scale up the capacity. All worth doing, but none of those things addressed what was actually happening: the warehouse was scanning the entire table for every filtered query, because there was no way to tell it which Parquet files actually contained the rows we cared about.

However, Microsoft shipped data clustering in preview at the end of November 2025, and the entire conversation changed.

In this article, I want to walk you through what data clustering is, how it works under the hood, and most importantly, I’ll show you a real demo on a 100-million-row clickstream table that you can run in your own warehouse. No abstractions, no marketing numbers, but actual T-SQL you can paste.

Some of the notes Nikola mentions remind me of some of the rules around making columnstore indexes work and for much of the same reason. But as Nikola’s demo shows, this is definitely a “You must be this tall to ride the ride” feature, and unless you’re talking about quite large fact tables with (at a minimum) billions of rows of data, the benefit mostly comes from reducing CUs rather than wall clock time improvements.

Leave a Comment

Get-SqlSafe: A Baseline Assessment for SQL Server Security

Andreas Wolter has a new tool:

In my experience with SQL Server security assessments, many environments show typical patterns: excessive permissions, weak or missing auditing, legacy authentication exposure, risky configuration choices, and ownership or access-control drift accumulated over years.

Get-SqlSafe Community Edition was released to give teams and also consultants a practical first look at those high-level indicators. It is a free PowerShell-based assessment tool for Microsoft SQL Server, supporting all versions from SQL Server 2016-2025 that helps surface baseline issues before they turn into deeper security problems.

Click through for the link where to grab Get-SqlSafe Community Edition, as well as more information on how it works.

Leave a Comment