Press "Enter" to skip to content

Curated SQL Posts

Kafka Topic Management in Amazon MSK

Swapna Bandla, et al, dig into a managed service:

If you manage Apache Kafka today, you know the effort required to manage topics. Whether you use infrastructure as code (IaC) solutions or perform operations with admin clients, setting up topic management takes valuable time that could be spent on building streaming applications.

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now streamlines topic management by supporting new topic APIs and console integration. You can programmatically create, update, and delete Apache Kafka topics using familiar interfaces including AWS Command Line Interface (AWS CLI), AWS SDKs, and AWS CloudFormation. With these APIs, you can define topic properties such as replication factor and partition count and configuration settings like retention and cleanup policies. The Amazon MSK console integrates these APIs, bringing all topic operations to one place. You can now create or update topics with a few selections using guided defaults while gaining comprehensive visibility into topic configurations, partition-level information, and metrics. You can browse for topics within a cluster, review replication settings and partition counts, and go into individual topics to examine detailed configuration, partition-level information, and metrics. A unified dashboard consolidates partition topics and metrics in one view.

In this post, we show you how to use the new topic management capabilities of Amazon MSK to streamline your Apache Kafka operations. We demonstrate how to manage topics through the console, control access with AWS Identity and Access Management (IAM), and bring topic provisioning into your continuous integration and continuous delivery (CI/CD) pipelines.

Read on to see what the experience looks like using the MSK console.

Leave a Comment

Automatic Index Compaction

Rebecca Lewis looks at a new Azure SQL Database preview feature:

Microsoft’s announcement of Automatic Index Compaction is titled ‘Stop defragmenting and start living’. That is not an accident. Brent Ozar has been making the case for years that defragmenting indexes is largely maintenance theater — that external fragmentation barely matters on modern SSDs and shared storage and that nightly rebuild jobs hammer your transaction log and I/O for gains that are difficult to measure.

His sessions on the topic have been circulating for over a decade, and now Microsoft’s own documentation states it plainly: ‘For most workloads, a higher index fragmentation doesn’t affect query performance or resource consumption.’ I believe that may be Brent’s argument almost verbatim in their official docs.

This could be interesting.

By the way, if you want a really deep dive on index maintenance, I’ll point back to a pair of sessions Jeff Moden did for TriPASS (the Triangle Area SQL Server User Group that I run) about 5 years back and was gracious enough to let us record. They are very long user group sessions but go into detail on exactly what kinds of index write patterns benefit from rebuilds and which ones don’t, as well as a lot more.

Leave a Comment

Building a Health Check with dbatools

David Seis has a dashboard:

After a longer break than expected, I am back to the dbatools for SQL Server DBAs blog series, and this one is a monster! In fact, this project is so big that I think it will be better to release it incrementally. This will allow you clone the repository, test and modify it in your own environment and come back for free updates over the weeks/ months to come. It is designed to allow you to customize it to your environment’s needs. This first post will be the initial process overview as well as the first set of checks and how to navigate the report.

Click through to see what David has in store.

Leave a Comment

Query Performance Tabs in Performance Monitor

Erik Darling has a new video:

In this video, I dive into the query performance tabs within my full dashboard, a free and open-source SQL Server monitoring tool. I highlight key features such as the QueryStore subtab, which provides detailed insights into query execution plans and regressions over time, helping you identify and address performance issues more effectively. Additionally, I explain how the custom trace patterns tab offers valuable data for safeguarding against overlooked performance anomalies, ensuring a comprehensive view of your SQL Server’s health.

Click through to see these capabilities.

Leave a Comment

Diagnosing a textConnection() Slowdown in R

Yihui Xie looks into an issue:

Running quarto render on a document with that single chunk took 35 seconds. The equivalent rmarkdown::render() finished in under half a second. As a side note in the issue, the reporter pinged me that the same problem existed in litedownlitedown is independent of both Quarto and knitr; it executes R code through xfun::record(). That is where I started looking.

Click through for the discovery process, explanation, and fix.

Leave a Comment

User-Defined Functions vs Calculation Groups in DAX

Marco Russo and Alberto Ferrari take a look back at calculation groups:

The introduction of user-defined functions (UDFs) in DAX changes the way we think about code reuse. Before UDFs existed, calculation groups were the only mechanism for sharing common logic across multiple calculations. Many developers adopted calculation groups not because they were the ideal tool for code reuse, but because there was no alternative.

Now that user-defined functions are available, it is time to revisit this practice. User-defined functions and calculation groups serve fundamentally different purposes. Understanding the distinction between the two is essential for building well-organized, efficient semantic models.

Click through for a dive into these two concepts and when to use each.

Leave a Comment

Smoothed Lines and Data Visualization

Kerry Kolosko digs into data visualization theory:

Power BI development is a relatively straight forward process when managed by one individual start to finish. But when the development process is shared among team members, ways of working need to be established and common work management frameworks such as agile, lean, HCD and UI/UX Design are adopted.

These frameworks can be useful for teams but as always, the rigid adoption and adherence to frameworks can cause project inefficiencies. It took a fair bit of corporate learning to acknowledge that applying Agile methodologies to a construction project and waterfall methodologies to a software project, weren’t effective.

There’s a lot in here around pros and cons of various tooling (like wireframing), visual selection, the grammar of graphics, and what smoothed lines actually represent. Smoothed lines is a bit of a hobby horse for me, as those smoothed lines represent a model of the data rather than the actual data, so if you show me the former, you’d better also show the latter.

Leave a Comment

Microsoft Fabric ETL and the Air Traffic Controller

Jens Vestergaard rethinks a metaphor:

In February 2025 I wrote about building an event-driven ETL system in Microsoft Fabric. The metaphor was air traffic control: notebooks as flights, Azure Service Bus as the control tower, the Bronze/Silver/Gold medallion layers as the runway sequence. The whole system existed because Fabric has core-based execution limits that throttle how many Spark jobs run simultaneously on a given capacity SKU.

The post was about working around a constraint. You could not just fire all your notebooks at once. You needed something to manage the queue.

More than a year on, it is worth being honest about what held up and what has changed.

Read on to see what has changed in this past year and how Jens thinks of it today.

Leave a Comment

Updates to Straight Path Solutions sp_Check Procedures

Jeff Iannucci has some updates:

This month though there are mostly a few small updates for the tools, as next month’s updates should also include many of the issues noted in GitHub (thanks for those!) Most of this month’s updates were to sp_CheckSecurity, although we did add “Initial File Size” to the output of sp_CheckTempdb since that had been requested by a few folks.

We hope these new updates can help you, especially if you’ve never used these stored procedures. Here are the updates for March 2026, with links to the GitHub repositories where you can download the latest versions.

Click through for those links.

Leave a Comment

Read-Write Ratios in SQL Server Databases

Louis Davidson resurrects an old article:

When I wrote this post, I had a very strong belief that my OLTP databases are heavy writes and OLAP heavy reads. But in many cases this isn’t exactly the true story. Why? Because you have to read a lot of data to update, delete and display data. Even inserts have reads to look for duplicates, for example. OLAP systems should likely be read more times than write, but loading data can be a lot of data moved around and reads maybe less if the queries are efficient. When I was looking for stuff to post, this post stood out as one that I really liked (other than the aforementioned title!)

I came upon this idea when I found a post by Jason Massie (whose handle was statisticsio back then) that I had referenced, and Kendal Van Dyke alerted me this post at SQL Saturday in Atlanta (this would have been back in 2008 or 2009). After reading that (long gone) post, I wrote this post using sys.dm_db_index_usage_stats, and added references to sys.dm_io_virtual_file_stats as well.

Click through for the script. My expectations going in would be that OLTP and OLAP servers would actually have fairly similar read-write ratios. We talk about OLTP being “write-heavy” and OLAP being “read-heavy” but in practice, you need a lot of write behavior to sustain a warehouse—especially one that isn’t just a “truncate and reload once a day” type of thing—and people care about reading the data from their OLTP systems. In practice, I see the split as more of optimizing for accuracy in data entry (OLTP) versus simplicity of reading data (OLAP).

Leave a Comment