Press "Enter" to skip to content

Author: Kevin Feasel

Updating a Mean without Original Data Points

John Cook has an interesting solution:

This post will look at the problem of updating an average grade as a very simple special case of Bayesian statistics and of Kalman filtering.

Suppose you’re keeping up with your average grade in a class, and you know your average after n tests, all weighted equally.

Click through for the walkthrough. This is similar to something I tried to puzzle out but ultimately admitted defeat: is there a way to calculate updates to the median without needing to know the entire set? In practical terms, this would be something like, how many pieces of information do I need to guarantee that I can maintain a median over time?

The best I could come up with was built along the premise of the likelihood of new data points being less than the median versus those greater than the median, where each pair of greater-lesser cancel each other out. If you have roughly equal numbers of new data points to each side, your “elements of the median” array can be pretty small. But the problem is, for any sufficiently small k, where k represents the number of elements you keep in memory, it is possible for a localized collection of (without loss of generality) lower-than-median data points to come in and completely wash out your memory. For example, if you kept 3 points and memory and you have four values below the median, you no longer know what the median is.

Trying to solve this without knowing the shape of the distribution or make any sequencing assumptions is something that I failed to do.

Leave a Comment

Role-Playing Dimensions and Direct Lake Semantic Models

Chris Webb finds a workaround to something that used to work:

Back in September 2024 I wrote a blog post on how to create multiple copies of the same dimension in a Direct Lake semantic model without creating copies of the underlying Delta table. Not long after that I started getting comments that people who tried following my instructions were getting errors, and while some bugs were fixed others remained. After asking around I have a workaround (thank you Kevin Moore) that will avoid all those errors, so while we’re waiting for the remaining fixes here are the details of the workaround.

I look at the set of steps needed to do this and say there has to be a better way.

Leave a Comment

Tracking Resource Utilization in Performance Monitor

Erik Darling has another video tutorial:

In this video, I delve into the resource metrics tabs of the full dashboard from my free SQL Server performance monitoring tool. I explain how these tabs provide more detailed information compared to the lighter version, including CPU utilization, TempDB usage, memory usage, and a selection of perfmon counters. The goal is to help you quickly identify when your server might be experiencing issues by showing real-time data and historical trends that can pinpoint specific performance bottlenecks. I also highlight how the weight stats section helps in understanding wait statistics, making it easier to diagnose problems related to resource contention and query compilation.

Click through for the video.

Leave a Comment

Automatic Index Compaction

Rebecca Lewis looks at a new Azure SQL Database preview feature:

Microsoft’s announcement of Automatic Index Compaction is titled ‘Stop defragmenting and start living’. That is not an accident. Brent Ozar has been making the case for years that defragmenting indexes is largely maintenance theater — that external fragmentation barely matters on modern SSDs and shared storage and that nightly rebuild jobs hammer your transaction log and I/O for gains that are difficult to measure.

His sessions on the topic have been circulating for over a decade, and now Microsoft’s own documentation states it plainly: ‘For most workloads, a higher index fragmentation doesn’t affect query performance or resource consumption.’ I believe that may be Brent’s argument almost verbatim in their official docs.

This could be interesting.

By the way, if you want a really deep dive on index maintenance, I’ll point back to a pair of sessions Jeff Moden did for TriPASS (the Triangle Area SQL Server User Group that I run) about 5 years back and was gracious enough to let us record. They are very long user group sessions but go into detail on exactly what kinds of index write patterns benefit from rebuilds and which ones don’t, as well as a lot more.

Leave a Comment

Kafka Topic Management in Amazon MSK

Swapna Bandla, et al, dig into a managed service:

If you manage Apache Kafka today, you know the effort required to manage topics. Whether you use infrastructure as code (IaC) solutions or perform operations with admin clients, setting up topic management takes valuable time that could be spent on building streaming applications.

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now streamlines topic management by supporting new topic APIs and console integration. You can programmatically create, update, and delete Apache Kafka topics using familiar interfaces including AWS Command Line Interface (AWS CLI), AWS SDKs, and AWS CloudFormation. With these APIs, you can define topic properties such as replication factor and partition count and configuration settings like retention and cleanup policies. The Amazon MSK console integrates these APIs, bringing all topic operations to one place. You can now create or update topics with a few selections using guided defaults while gaining comprehensive visibility into topic configurations, partition-level information, and metrics. You can browse for topics within a cluster, review replication settings and partition counts, and go into individual topics to examine detailed configuration, partition-level information, and metrics. A unified dashboard consolidates partition topics and metrics in one view.

In this post, we show you how to use the new topic management capabilities of Amazon MSK to streamline your Apache Kafka operations. We demonstrate how to manage topics through the console, control access with AWS Identity and Access Management (IAM), and bring topic provisioning into your continuous integration and continuous delivery (CI/CD) pipelines.

Read on to see what the experience looks like using the MSK console.

Leave a Comment

Query Performance Tabs in Performance Monitor

Erik Darling has a new video:

In this video, I dive into the query performance tabs within my full dashboard, a free and open-source SQL Server monitoring tool. I highlight key features such as the QueryStore subtab, which provides detailed insights into query execution plans and regressions over time, helping you identify and address performance issues more effectively. Additionally, I explain how the custom trace patterns tab offers valuable data for safeguarding against overlooked performance anomalies, ensuring a comprehensive view of your SQL Server’s health.

Click through to see these capabilities.

Leave a Comment

Building a Health Check with dbatools

David Seis has a dashboard:

After a longer break than expected, I am back to the dbatools for SQL Server DBAs blog series, and this one is a monster! In fact, this project is so big that I think it will be better to release it incrementally. This will allow you clone the repository, test and modify it in your own environment and come back for free updates over the weeks/ months to come. It is designed to allow you to customize it to your environment’s needs. This first post will be the initial process overview as well as the first set of checks and how to navigate the report.

Click through to see what David has in store.

Leave a Comment

Diagnosing a textConnection() Slowdown in R

Yihui Xie looks into an issue:

Running quarto render on a document with that single chunk took 35 seconds. The equivalent rmarkdown::render() finished in under half a second. As a side note in the issue, the reporter pinged me that the same problem existed in litedownlitedown is independent of both Quarto and knitr; it executes R code through xfun::record(). That is where I started looking.

Click through for the discovery process, explanation, and fix.

Leave a Comment