Press "Enter" to skip to content

Month: October 2025

Scaling Kafka Streams Applications

The Confluent employee mines have a new article:

As the adoption of real-time data processing accelerates, the ability to scale stream processing applications to handle high-volume traffic is paramount. Apache Kafka®, the de facto standard for distributed event streaming, provides a powerful and scalable library in Kafka Streams for building such applications. 

Scaling a Kafka Streams application effectively involves a multi-faceted approach that encompasses architectural design, configuration tuning, and diligent monitoring. This guide will walk you through the essential strategies and best practices to ensure your Kafka Streams applications can gracefully handle massive throughput.

The post gets into some details around the kinds of limits you’ll hit during scaling, scale-up versus scale-out, and configuration settings to help with that scale.

Leave a Comment

Use Cases for Window Functions

I have a new video:

In this video, I take you through a variety of use cases for window functions, showing how you can solve common (and sometimes uncommon) business problems efficiently and effectively.

This video builds off of the prior two videos. Those prior two videos showed what the different window functions are and how they work. This one focuses primarily on solving business problems in sometimes-clever ways.

Leave a Comment

Tag-Based Masking in Snowflake

Kevin Wilkie gets tagging:

If you’ve followed our site for a while, you would have seen in a previous post how powerful tag-based masking policies are in Snowflake. They let you enforce consistent data masking rules across columns without constantly rewriting logic. But Snowflake hasn’t stopped there—recent enhancements now make it even easier to classify, tag, and mask data at scale. In this post, we’ll recap the essentials of tag-based masking, highlight the new functionality, and share some practical tips for rolling it out in your environment.

Kevin has a new blog theme and everything.

Leave a Comment

Set MAXDOP in Azure SQL DB

Brent Ozar has a public service announcement:

In Azure SQL DB, you set max degrees of parallelism at the database level. You right-click on the database, go into properties, and set the MAXDOP number.

I say “you” because it really is “you” – this is on you, bucko. Microsoft’s magical self-tuning database doesn’t do this for you.

And where this backfires, badly, is that Azure SQL DB has much, much lower caps on the maximum number of worker threads your database can consume before it gets cut off. 

Click through to see what kind of error message you get and just how low these limits are.

Leave a Comment

Adding a Drillthrough Button in Power BI

Elena Drakulevska adds a button:

If you’ve been building Power BI reports, you probably know about drillthrough.

In short: drillthrough lets users move from a summary view to a detail page focused on one data point. For example, you can right-click on Austria in a sales chart and jump straight to a page showing visuals and metrics only about Austria.

Sounds powerful, right?

The catch: most users don’t even know it’s been implemented.

The other catch: those of us sad souls using Power BI Report Server don’t get drillthrough at all.

Leave a Comment

Microsoft Fabric Copy Job Updates

Ye Xu has an update:

Copy job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range of scenarios—all through an intuitive, easy-to-use experience.

This update introduces several enhancements, including connection parameterization, expanded CDC capabilities, new connectors, and a streamlined Copy Assistant powered by Copy job.

Read on to see what’s new. Some of the items in this list are preview features, and it looks like others are currently GA.

Leave a Comment

Finding Rows with Errors in Power Query

Gilbert Quevauvilliers goes around looking for trouble:

In the past when there has been an error when loading data into the semantic model, there can be times when clicking on the View errors can either take a very long time to show those errors. Or in some cases it never shows you the error.

In this blog post I am going to show you an alternative way to quickly find the errors.

The column quality data preview option is absolutely worth keeping on at all times.

Leave a Comment