Press "Enter" to skip to content

Day: April 6, 2026

Updating a Mean without Original Data Points

John Cook has an interesting solution:

This post will look at the problem of updating an average grade as a very simple special case of Bayesian statistics and of Kalman filtering.

Suppose you’re keeping up with your average grade in a class, and you know your average after n tests, all weighted equally.

Click through for the walkthrough. This is similar to something I tried to puzzle out but ultimately admitted defeat: is there a way to calculate updates to the median without needing to know the entire set? In practical terms, this would be something like, how many pieces of information do I need to guarantee that I can maintain a median over time?

The best I could come up with was built along the premise of the likelihood of new data points being less than the median versus those greater than the median, where each pair of greater-lesser cancel each other out. If you have roughly equal numbers of new data points to each side, your “elements of the median” array can be pretty small. But the problem is, for any sufficiently small k, where k represents the number of elements you keep in memory, it is possible for a localized collection of (without loss of generality) lower-than-median data points to come in and completely wash out your memory. For example, if you kept 3 points and memory and you have four values below the median, you no longer know what the median is.

Trying to solve this without knowing the shape of the distribution or make any sequencing assumptions is something that I failed to do.

Leave a Comment

Tracking Resource Utilization in Performance Monitor

Erik Darling has another video tutorial:

In this video, I delve into the resource metrics tabs of the full dashboard from my free SQL Server performance monitoring tool. I explain how these tabs provide more detailed information compared to the lighter version, including CPU utilization, TempDB usage, memory usage, and a selection of perfmon counters. The goal is to help you quickly identify when your server might be experiencing issues by showing real-time data and historical trends that can pinpoint specific performance bottlenecks. I also highlight how the weight stats section helps in understanding wait statistics, making it easier to diagnose problems related to resource contention and query compilation.

Click through for the video.

Leave a Comment

Role-Playing Dimensions and Direct Lake Semantic Models

Chris Webb finds a workaround to something that used to work:

Back in September 2024 I wrote a blog post on how to create multiple copies of the same dimension in a Direct Lake semantic model without creating copies of the underlying Delta table. Not long after that I started getting comments that people who tried following my instructions were getting errors, and while some bugs were fixed others remained. After asking around I have a workaround (thank you Kevin Moore) that will avoid all those errors, so while we’re waiting for the remaining fixes here are the details of the workaround.

I look at the set of steps needed to do this and say there has to be a better way.

Leave a Comment