Press "Enter" to skip to content

Curated SQL Posts

Time Delay for Online Checksums in PostgreSQL

Cristophe Pettus notes an upcoming change to PostgreSQL 19:

For about fifteen years the answer to “can I turn on data checksums without an initdb?” has been “not really.” pg_checksums showed up in PostgreSQL 12 and made the job survivable, but you still had to shut the cluster down. For anyone running 24×7 production, that has left the same three options: take the downtime, fail over through a checksummed replica, or live without checksums.

PostgreSQL 19 adds a fourth path. A commit from Daniel Gustafsson on April 3rd wires up online enabling and disabling of data checksums: the command completes immediately, and the cluster keeps serving traffic while a background process rewrites every heap and index page in the cluster to carry (or drop) the checksum.

Read on to see what it will do, as well as the consequences.

Leave a Comment

Storage Load Testing SQL Server

Anthony Nocentino designs a test:

I’ve been doing storage load tests for SQL Server for a long time, both as a consultant and now in my work at Everpure, and I see the same patterns over and over. Someone spins up a VM with two vCPUs, points it at a storage subsystem (cloud or on-prem), runs a thousand threads at it, and then concludes that the storage stinks. Or the opposite, where they buy a 64 gigabit HBA, plug it into the wrong PCIe slot, and wonder why they’re leaving half of the capacity on the table.

Designing a good load test isn’t about pushing the biggest number you can find. It’s about knowing what your gear is supposed to do, driving load until you find the point where the system breaks down, and measuring at the right places along the way. In this post, I’m going to walk you through how I think about designing a storage load test, the tools I reach for, and the pitfalls that will invalidate your results if you’re not paying attention.

Click through for Anthony’s thought process.

Leave a Comment

Building Materialized Lake Views in Microsoft Fabric

Nikola Ilic presses the Easy button:

For the longest time, building a medallion architecture in Microsoft Fabric meant stitching together a small orchestra of moving parts: notebooks for the transformations, pipelines for orchestration, schedules for refresh, custom code for data quality checks, and the Monitor Hub for keeping an eye on whether anything actually worked. Every layer worked – until something didn’t, and then you had to figure out which layer broke, why, and which downstream layers got affected along the way.

If you’ve ever tried to debug a silver layer that didn’t update because the bronze notebook failed three hours ago, you know exactly what I’m talking about.

Then, at FabCon Atlanta in March 2026, materialized lake views (MLVs) went generally available. And the story they’re telling is simple: what if your entire medallion pipeline could be a few SELECT statements?

Let me walk you through the whole thing – what they are, how they work, what changed between preview and GA, and where they fit (and where they don’t) in your architecture.

Read on for that walkthrough.

Leave a Comment

Generating Sample Data in Fabric Dataflows

Chris Webb builds some data:

Back in December the FabricAI.Prompt() M function was released in Fabric Dataflows Gen2. Most of the people writing about it at that time, as in this great post by my colleague Sandeep Pawar, focused on calling this function for each row in a table – something that the UI in the editor makes easy. However the FabricAI.Prompt() function itself is a lot more flexible. You can use it to summarise whole tables of data as I showed here; you can also use it to generate sample data. This is similar to what I blogged about here where I got Copilot to generate M code that returned sample data but using FabricAI.Prompt() is maybe a bit simpler.

Click through to see how.

Leave a Comment

Unplanned Failover and SQL Server on Kubernetes

Anthony Nocentino performs additional testing:

In my planned failover walkthrough, I showed what happens when you deliberately move the primary role to another replica. That’s the easy case. Now I want to show what happens when the primary pod just disappears unexpectedly, like during a node failure or a container crash. No graceful shutdown, no demotion, just gone.

I ran two test scenarios, each cycling the primary role across all three pods by force-deleting the current primary three times in a row. First, a 5GB TPC-C database idle. Then, that same 5GB database under sustained HammerDB TPC-C load. Six force-deletes total, six successful automatic failovers. I’ll walk through the error log from the promoted replica, the operator’s detection and recovery behavior, and the full timing data.

Read on to see how Anthony’s SQL Server Kubernetes operator handles when things go bump in the night.

Leave a Comment

Training, Serving, and Deploying Scikit-Learn Models via FastAPI

Abid Ali Awan serves a model:

In this article, you will learn how to train a Scikit-learn classification model, serve it with FastAPI, and deploy it to FastAPI Cloud.

Topics we will cover include:

  • How to structure a simple project and train a Scikit-learn model for inference.
  • How to build and test a FastAPI inference API locally.
  • How to deploy the API to FastAPI Cloud and prepare it for more production-ready usage.

Click through for the process.

Leave a Comment

Cross-Workspace MLflow Logging Available in Microsoft Fabric

Ruixin Xu announces a feature now generally available:

Cross-workspace logging works through the synapseml-mlflow package, which provides a Fabric-compatible MLflow tracking plugin. The core idea is simple: set the MLFLOW_TRACKING_URI* to point at your target workspace and use standard MLflow commands. Your experiments, metrics, parameters, and registered models land in the workspace you choose — not just the one you’re running in.

Read on for the full announcement.

Leave a Comment

Filtered Indexes and Computed Columns

Greg Low has a public service announcement:

On a client site some years back, I came across a situation (unfortunately too common) where a column in a table was being used for two purposes. It could either hold an integer value or a string. Only about 100 rows out of many millions had the integer value. Some of the client code needed to calculate the maximum value when it was an integer. First step I tried was to add a persisted calculated column like so:

After that, Greg tried to create a filtered index. Read on to see how that worked.

Leave a Comment

Testing Your SQL Server Backups

Vlad Drumea stresses the need to test:

Folks think that if they just take backups regularly, and their backup intervals match their recovery point objectives, then they can rest assured knowing (more like falsely thinking) they have backups from which to restore in case something bad happens to a database or to the entire instance.

Yeah, there are several things that can go wrong, even beyond the solid answers Vlad provides. Think about developers taking one-off backups that break the log chain, overly-aggressive purge processes that delete backups before they’re fully moved to long-term storage, etc.

Definitely check out Vlad’s post. If you’re interested, I also have a video series on YouTube that covers SQL Server backup and restore.

Leave a Comment