Press "Enter" to skip to content

Day: June 12, 2026

Client-Local Parallelism in Kafka

Jack Vanlightly continues a series on Apache Kafka:

Broker-visible parallelism is simple to reason about: if each consumer processes records serially, we add more consumers to increase parallelism. But each consumer adds overhead to the brokers: broker-side protocol state, TCP connections, group membership, fetch state, and participation in the consumer or share group protocol. With long processing times and/or high throughput, the required number of parallel workers can easily exceed what is practical to model as broker-visible consumers.

That is where client-local parallelism becomes important. Instead of scaling by adding more consumers, each consumer application can poll records and process them concurrently inside the client. This allows a smaller number of Kafka consumers to drive a much larger amount of parallel work.

Read on to learn how client-local parallelism fits and how it performs compared to consumer groups.

Leave a Comment

Comparative Query Analysis and Planning

Christophe Pettus has a two-parter. The first post covers how a half-dozen data platform technologies handle cost-based analysis:

PostgreSQL has ANALYZE. You run it (or autovacuum runs it for you), it draws a sample of 300 × default_statistics_target rows, and it writes a row per column into pg_statistic: a null fraction, an n-distinct estimate, a most-common-values list, an equi-depth histogram, and a physical-vs-logical correlation. The planner reads those numbers, multiplies selectivities together, costs a handful of join strategies, and picks one. Three join algorithms are on the menu: nested loop, merge join, hash join.

That is the entire shape of the problem, and every cost-based optimizer ever shipped solves the same one. They differ in three places, and only three: where the numbers come from, how stale the numbers are allowed to get, and which plan shapes are even legal to choose between. The algorithms are the boring part. Everybody hash-joins. The interesting part is the bookkeeping.

Then there’s how each of the systems generates a query plan:

Statistics are the input. Planning is what the database does with them: it takes a declarative query, which describes what you want and says nothing about how, and turns it into an executable plan, which is nothing but how. There are two jobs inside that. First, rewrite the query into a logically equivalent but more tractable shape, which is where subquery flattening, predicate pushdown, and view merging live. Second, search the space of physical plans (join orders, join algorithms, access paths) for the cheapest one the cost model can find. The second job is the hard one, because the number of possible join orders for a query grows faster than anyone wants to contemplate, and every database in this article is, underneath, a strategy for not enumerating all of them.

Two questions separate the six systems here. How does each one tame that search space? And once it has an answer, how much will it let you argue with the result? Those sound like the same question. They are not, and the most useful thing this comparison does is pull them apart. A database can search brilliantly and refuse you any override at all (Snowflake), search crudely and hand you a fistful of hints anyway (MySQL until recently), or search hard and expose every lever ever machined (Oracle). Sophistication of the search and generosity of the control surface are independent axes. Knowing where a system sits on each tells you most of what its planner feels like to live with.

Slightly odd is that there’s a section of DB2 but not on SQL Server. But it is a good cross-comparison of several of the top relational database options.

Leave a Comment

Building a Type-2 Slowly Changing Dimension

Kristyna Ferris builds a dimension:

This is a blog that I am writing for future me and hopefully it’ll help a few of you save some time too! It’s not often that I get to build out a data warehouse from scratch, but when I do, I want to make sure I do it well with best practices in place. Because this is not something I do a lot of, I frequently forget lessons I’ve learned and have to go back and drop tables to recreate them in the best way before it’s too late. One table type that is vital to do right the first time is a Slowly Changing Dimension Type 2 (SCD2 for short).

Click through for an explanation, as well as example scripts for both SQL Server-adjacent products and the Microsoft Fabric warehouse.

Leave a Comment

Multiple Domain Controllers and Kerberos Double-Hops

Randy Knight sorts out a problem:

We recently had a customer where Windows logins to a SQL Server Availability Group started throwing “Cannot generate SSPI context” — but only sometimes, and only for some people. The same user, on the same workstation, would fail one minute and connect the next. Applications were fine. Nothing had changed on SQL Server.

It took the better part of two days and a lot of second-guessing to run down, and the culprit turned out to be something most of us never think about: which domain controller handed out the Kerberos ticket.

Click through for the answer. Surprisingly, it has nothing to do with SPNs for once.

Leave a Comment

Deprecated SQL Server Features since 2016

Jeff Iannucci has a list:

If you’re planning on upgrading from SQL Server 2016 to the current version of SQL Server 2025, it would be helpful to note some of the features from versions since 2016 that are no longer supported. By that I’m talking about those things classified as “deprecated” (no longer supported) or “discontinued” (removed from the product). Let’s take a look at these features and say our goodbyes, organized by versions.

There’s not a lot on here that would affect most companies. Though to one of Jeff’s points in the article, I actually have seen a company use lightweight pooling, mostly because they didn’t know what it did and saw the bit about how it could improve performance. I came in and flipped that switch off for them. That was the kind of feature that likely benefited one or two large customers 30 years ago or so and it just hung around in the product due to inertia.

Leave a Comment

Power BI’s Row-Level Security Bitmap Changes

Phil Seamark lays out the consequences of a change:

If you have ever watched a model with row-level security run beautifully for months and then quietly fall off a cliff, this one is for you. Nothing in the model changed. You did not touch the security rules. One table just grew past a line you could not see, and every query under that role started doing a great deal more work than it used to.

That line just moved, and it moved in your favour. Here is what it is and why it sits where it does.

Click through to learn more about how Power BI handles row-level security predicate matching.

Leave a Comment