Press "Enter" to skip to content

Day: March 6, 2026

Diskless Topics in Apache Kafka

Paul Brebner extends a metaphor:

I’ve been tracking the progress of Apache Kafka “Diskless Topics” for a while now. It’s a topic that sparks curiosity—mostly because the name itself sounds like an oxymoron. How can a topic be diskless? Where does the data go? 

With the recent voting on KIP-1150, I decided it was time to dive deep into the architectural changes. There are several related Kafka Improvement Proposals (KIPs) floating around, but KIP-1150 is dependent on KIP-1163 and KIP-1164, and the designs are still in flux. Consider this blog post a “theory” in the true scientific sense: a best-guess model based on current evidence that will almost certainly evolve. 

Click through for your moment of zen.

Leave a Comment

Creating Fabric Linked Service Parameters for ADO Deployment

Koen Verbeeck glues together several technologies:

Quite the title, so let me set the stage first. You have an Azure Data Factory instance (or Azure Synapse Pipelines) and you have a couple of linked services that point to Fabric artifacts such as a lakehouse or a warehouse. You want to deploy your ADF instance with an Azure Devops build/release pipeline to another environment (e.g. acceptance or production) and this means the linked services need to change as well because in those environments the lakehouse or warehouse are in a different workspace (and also have different object Ids).

When you want to deploy ADF, you typically use the ARM template that ADF automatically creates when you publish (when your instance is linked with a git repo). More information about this setup can be found in the documentation. To parameterize certain properties of a linked service, you can use custom parameterization of the ARM template. Anyway, long story short, I tried to parameterize the properties of the Fabric linked service. 

Read on to see how that went, as well as what you need to do to solve this issue.

Leave a Comment

Scan Types in PostgreSQL

Warda Bibi lays out four classes of scan in PostgreSQL:

To understand how PostgreSQL scans data, we first need to understand how PostgreSQL stores it.

  • A table is stored as a collection of 8KB pages (by default) on disk.
  • Each page has a header, an array of item pointers (also called line pointers), and the actual tuple data growing from the bottom up.
  • Each tuple has its own header containing visibility info: xmin, xmax, cmin/cmax, and infomask bits.

There are different ways PostgreSQL can read data from disk. Depending on the query and available indexes, it can choose from several scan strategies:

  1. Sequential Scan 
  2. Index Scan
  3. Index-Only Scan
  4. Bitmap Index Scan

Read on for a description of those types, as well as when it makes sense for the database engine to select a particular scan type.

Leave a Comment

Alerting People in Microsoft Teams from Data Factory Pipelines

Andy Brownsword sends a message:

Whether running Data Factory, Synapse, or Fabric pipelines, things go wrong – and the de facto response is to send an email. We’ve looked at sending emails from pipelines before, but at scale they can become noise and are easy to ignore.

A more effective option is to surface alerts where collaboration already exists, such as Teams.

In this post we’re going to start looking at using Teams and consolidate notifications into a channel. This functionality gives team members visibility, the ability to update in threads, and the option to tag people for a tighter response loop than typical emails bring.

Click through for the process.

Leave a Comment

Where the Buck Stops

Louis Davidson talks slop:

I loathe the phrase AI Slop. I have said it before, I don’t like the phrase because it is generally attributed to some content that a person has posted. I blame the poster, not the generator. We all use AI these days, just like they used tractors to farm, computers to do accounting work, and CGI to produce movies. These are all tools.

But when I sign my name to something, it is really and truly mine. In this blog, I will discuss this and more. So as the title says, don’t blame AI, Google, a person’s teachers in grade school, nope. Blame the person who said, “This is good enough to put out in my name”, or in other words, the person in the byline. For this post and video, that is Louis Davidson.

I understand where Louis is going with this and it’s fair. When you publish something, the person ultimately responsible looks suspiciously like the picture on your driver’s license. But I think it can serve as a useful descriptive term for a category of garbage output without removing agency from the perpetrator.

Leave a Comment

Performance Studio

Erik Darling has a new free tool:

Stop clicking through SSMS execution plans like it’s 2005.
Performance Studio is a free, open-source plan analyzer that tells you what’s wrong,
where it’s wrong, and how bad it is — from the command line, a desktop GUI,
an SSMS extension, or an AI assistant.

Built by someone who has stared at more execution plans than any reasonable person should.

Click through for some of its capabilities, as well as how to get your hands on a copy.

Leave a Comment