Press "Enter" to skip to content

Day: May 14, 2026

When R^2 Misleads

Holger von Jouanne-Diedrich explains a common quality metric for regression analyses:

A high R^2 can make a regression model look impressively accurate — but this number can be deceptive. If you want to understand why a high R^2 is not always a sign of a good model, read on!

Click through for that explanation. This post does a fantastic job of explaining the technical reasons why a high R^2 might not be indicative of a good model specification. But I’d add one other piece to the puzzle: what constitutes a high R^2 will depend very much on the domain. For example, if you are performing a regression of some process in physics, an R^2 of 0.90 is probably so low as to indicate you’ve made a horrible mistake somewhere to have a number so low.

By contrast, an R^2 of 0.90 in the context of a social studies analysis would get you laughed out of the room for obviously faking the data or misunderstanding the specification to get a number that high.

Leave a Comment

Metadata-Driven Frameworks for Change Detection in Microsoft Fabric

Kevin Chant builds a table:

I had various options for this months contribution due to my experience with various change detection solutions. Including Azure Synapse Link for SQL Server 2022. Which I covered in previous posts. Including one that covered some excessive file tests for Azure Synapse Link for SQL Server 2022.

In the end I decided to cover developing metadata-driven frameworks for Microsoft Fabric. Due to the fact that it is such a hot topic for multiple reasons. One of which is the growing availability of open-source, metadata-driven frameworks for Microsoft Fabric.

Read on for three such frameworks and some advice on how to use them.

Leave a Comment

Creating Better Scatterplots

Ruben Van de Voorde embraces the second dimension:

Scatterplots are in a weird place in Power BI reports. They’re incredibly good at their core business: showing how two metrics relate across many things, like products, customers, or suppliers.

But they can miss the landing in a few ways. Sometimes the relationship itself matters but the chart asks the reader to do too much inference: “Why should I care about a product’s Gross Margin % vs. Shipping Weight?” Other times, the reader can’t tell what the dots actually are. A reader asking “What does one dot represent?” is the clearest tell, sometimes followed by “Can’t this be a table instead of these dots?”

Click through for musings about scatterplots, their bubble plot cousins, and what’s available in DAX and Power BI to make them work for you.

Leave a Comment

Semantic Sonar: Canary Testing Power BI Semantic Models

Jens Vestergaard has a new tool:

Quarterly sales season always brings the same ritual: a business user opens Power BI, clicks into the sales dashboard, and waits for the numbers to load. It has been a while since anyone looked at this particular report. The numbers look off. Someone hits refresh. The spinner turns, but the data does not budge. The last refresh was weeks ago.

That is when the emails start. “Can you check why the sales model hasn’t updated?” The support thread grows. IT investigates. The answer is usually not in the data. The refresh failed at 3 AM. Or the data source changed its connection string last Tuesday. Or someone revoked the service principal without telling anyone.

By the time the root cause is found, the report has been open in meetings for hours, and the numbers have been wrong for days.

Read on for a free tool that can prevent this sort of issue.

I think the biggest IT super-power is the ability consistently to troubleshoot a problem from limited information. The second-biggest super-power is the ability to discover issues before the people who rely on your services do. Then you get them fixed and they have no idea there was even a potential problem.

Leave a Comment

Tracking Data Changes

Louis Davidson shows a technique:

A few months ago, I wrote a post about comparing sets of data in SQL, This focussed on a type of challenge that is often a one off challenge. The techniques lists (along with a tool like Redgate’s SQL Data Compare aren’t the point of this post, but they are related because when you do change detection from a source, it is super important to check your results occasionally. Having a copy of the complete source to compare to your destination (even if it is just checksums of the data,) is important.

The techniques that Meagan is asking about this month are more ETL related, where you check a stream of data and sync them as changes are made. Typically, you don’t want to compare all the rows in a set, but just the ones that have changed. At some frequency, give me the changes to one set of data and keep it up to date with another.

I’m still partial to using HASHBYTES() on the fields I care about because I don’t trust modified dates unless I know the only way to access that table is via stored procedure, and all of the stored procedures handle updating the modified date correctly.

For row comparisons, I’d also look at EXCEPT for overall row comparison. You do need to do it in both directions, so the pattern is more like A EXCEPT B UNION ALL B EXCEPT A but it works great and natively handles any NULLs along the way.

Leave a Comment