Press "Enter" to skip to content

Curated SQL Posts

A Primer on Partitioned Views

Erik Darling talks about an old-style way of partitioning in SQL Server:

Erik Darling here with Darling Data. And we’re going to finish off this Friday by talking about partitioned views. And look, there are a lot of things I could say about partitioned views that are great and grand and that have come in handy for me over the years in ways that I’m like, wow, thank you partitioned views. Thank you for not being normal table partitioning. Thank you for existing. 

Read on to see how they work, how you can write into them, things that might prevent you from writing into partitioned views directly, and why you probably don’t want writable partitioned views anyhow.

Leave a Comment

Exceeding the Capacity Limit for Power BI Dataset Refreshes

Chris Webb explains an error:

If you have a lot of Power BI semantic models that are scheduled to refresh at the same time in the Service then you may find that some of them fail with the following error:

You’ve exceeded the capacity limit for dataset refreshes. Try again when fewer datasets are being processed.

[Note: “dataset” is the old name for a Power BI semantic model. Someone should update the error message.]

Read on to see what can cause this error and what you can do about it.

Leave a Comment

What’s New in Cassandra 6

Mariah McLaughlin lays out some of the new features in the latest version of Cassandra:

Accord is a general-purpose transaction framework that uses a leaderless consensus protocol to have highly available transactions and is used in Cassandra 6. The goal is broader transactional support across multiple keys, with strict serializable isolation and without a central bottleneck.

This matters because multi-key consistency is hard to handle cleanly in application code. Once a workflow spans more than one partition, the application often ends up doing coordination work that really belongs in the database.

Accord enables ACID behavior on transactional tables, which lets developers coordinate multi-step, multi-partition changes with stronger correctness guarantees, reducing the amount of custom consistency logic they have to build in the application.

Click through for more information on this, as well as a few other significant features.

Leave a Comment

Scoring the Quality of Binary Classification with SQL Server

Sebastiao Pereira quantifies a result:

Machine Learning (ML) is a way of teaching computers to learn from data instead of being explicitly programmed. Performance metrics are essential tools for understanding how well a model actually works. They tell you not just how accurate the model is, but how reliablefair, and useful it will be in real-world applications. In other words, without them, machine learning would be a trial-and-error guesswork.

Binary classification is when each sample is labeled as one of two mutually exclusive classes, referenced to a categorization, like positive or negative.

How do you implement the binary classification performance metric in SQL Server without using external tools?

Click through for a series of metrics to determine how well a binary classification process performed. This post doesn’t include details on how to perform the classification, just what to do once you have the results.

Leave a Comment

Eleven CVEs for PostgreSQL

Christophe Pettus takes us through just shy of a dozen security issue fixes:

PostgreSQL 18.4, 17.10, 16.14, 15.18, and 14.23 are out as of May 14, 2026. The release fixes eleven security issues and more than sixty bugs. That is not a typo. Eleven CVEs is the largest single-release security batch I can remember, and three of them are CVSS 8.8 with practical exploitation paths. Patch this week. If you can patch tomorrow, patch tomorrow.

Click through for a rundown.

Leave a Comment

Partitioning and Columnstore Indexes

Erik Darling puts together a great combination for a very large dataset:

 So, today we’re going to talk about partitioning in columnstore because there are important differences between partitioned columnstore tables and partitioned rowstore tables. One of the sort of superpowers that columnstore has is the ability to use metadata about which row groups have which data in them, and it can skip entire segments that do not contain relevant data.

I agree with Erik’s point that you do need around 500 million or so rows before this capability really shines, but if you do pick the right partition key, you get one of those rare and coveted performance improvements from partitioning.

Leave a Comment

When R^2 Misleads

Holger von Jouanne-Diedrich explains a common quality metric for regression analyses:

A high R^2 can make a regression model look impressively accurate — but this number can be deceptive. If you want to understand why a high R^2 is not always a sign of a good model, read on!

Click through for that explanation. This post does a fantastic job of explaining the technical reasons why a high R^2 might not be indicative of a good model specification. But I’d add one other piece to the puzzle: what constitutes a high R^2 will depend very much on the domain. For example, if you are performing a regression of some process in physics, an R^2 of 0.90 is probably so low as to indicate you’ve made a horrible mistake somewhere to have a number so low.

By contrast, an R^2 of 0.90 in the context of a social studies analysis would get you laughed out of the room for obviously faking the data or misunderstanding the specification to get a number that high.

Leave a Comment

Metadata-Driven Frameworks for Change Detection in Microsoft Fabric

Kevin Chant builds a table:

I had various options for this months contribution due to my experience with various change detection solutions. Including Azure Synapse Link for SQL Server 2022. Which I covered in previous posts. Including one that covered some excessive file tests for Azure Synapse Link for SQL Server 2022.

In the end I decided to cover developing metadata-driven frameworks for Microsoft Fabric. Due to the fact that it is such a hot topic for multiple reasons. One of which is the growing availability of open-source, metadata-driven frameworks for Microsoft Fabric.

Read on for three such frameworks and some advice on how to use them.

Leave a Comment

Creating Better Scatterplots

Ruben Van de Voorde embraces the second dimension:

Scatterplots are in a weird place in Power BI reports. They’re incredibly good at their core business: showing how two metrics relate across many things, like products, customers, or suppliers.

But they can miss the landing in a few ways. Sometimes the relationship itself matters but the chart asks the reader to do too much inference: “Why should I care about a product’s Gross Margin % vs. Shipping Weight?” Other times, the reader can’t tell what the dots actually are. A reader asking “What does one dot represent?” is the clearest tell, sometimes followed by “Can’t this be a table instead of these dots?”

Click through for musings about scatterplots, their bubble plot cousins, and what’s available in DAX and Power BI to make them work for you.

Leave a Comment

Semantic Sonar: Canary Testing Power BI Semantic Models

Jens Vestergaard has a new tool:

Quarterly sales season always brings the same ritual: a business user opens Power BI, clicks into the sales dashboard, and waits for the numbers to load. It has been a while since anyone looked at this particular report. The numbers look off. Someone hits refresh. The spinner turns, but the data does not budge. The last refresh was weeks ago.

That is when the emails start. “Can you check why the sales model hasn’t updated?” The support thread grows. IT investigates. The answer is usually not in the data. The refresh failed at 3 AM. Or the data source changed its connection string last Tuesday. Or someone revoked the service principal without telling anyone.

By the time the root cause is found, the report has been open in meetings for hours, and the numbers have been wrong for days.

Read on for a free tool that can prevent this sort of issue.

I think the biggest IT super-power is the ability consistently to troubleshoot a problem from limited information. The second-biggest super-power is the ability to discover issues before the people who rely on your services do. Then you get them fixed and they have no idea there was even a potential problem.

Leave a Comment