Press "Enter" to skip to content

Month: June 2025

Consumer Group Rebalancing in Kafka and KIP-848

Jonathan Lacefield gives us a heads-up:

Historically, Kafka has relied on what we now call the “classic” rebalance protocol. This protocol evolved, as it was initially dominated by the “eager” assignment strategy. Eager rebalancing worked on a stop-the-world principle: Any change in group membership (consumer joining/leaving) or topic metadata triggered a complete halt. All consumers revoked their partitions, a leader computed a new assignment, and partitions were redistributed before processing could resume. This caused significant downtime, especially in dynamic environments.

To mitigate this, the cooperative assignment strategy was introduced within the classic protocol. Cooperative rebalancing reduced downtime by allowing consumers to keep partitions unaffected by the rebalance, revoking only those needing reassignment.

Read on to learn about some of the challenges that exist with rebalancing, and what KIP-848 promises to do.

Leave a Comment

Simplifying Calculations with the APPLY Operator

I have a new video:

In this video, I show how we can use the APPLY operator to remove redundancy in the SELECT clause and simplify complex calculations, all with zero performance impact.

This is, as I’ve said in the past, my favorite use case for the APPLY operator. As I’ve become older and (even) more crochety, I’ve sided more and more with “easy to read” versus “runs faster” for code. And when you get “easy to read” with no impact on “runs faster,” I’m all in.

The accounting scenario I show may be a fairly extreme case, but I’d bet that queries similar to this abound in almost any company. A big part of why complex reporting queries are so complex comes from repetition of calculations.

Leave a Comment

Invoking Child Pipelines in Microsoft Fabric

Meagan Longoria spots the fork in the road:

At the moment there are two activities in Fabric pipelines that allow you to execute a “child” pipeline. They are both named “Invoke Pipeline” but are differentiated by the labels “Legacy” and “Preview” in parentheses.

Read on to learn more about these two and why choosing the new one may not always be the best option for you, at least not yet.

Leave a Comment

Populating Microsoft Fabric Data Agents with Semantic Model Synonyms

Marc Lelijveld explains some terms:

It was only yesterday, that I wrote a blog post on Semantic Models as a source for Fabric Data Agents. Not much time has passed, since I learned that Fabric Data Agents does not (always) respect the Synonyms that have been added to a Semantic Model. As a result, the Data Agent may start creating implicit measures, not respecting the definitions and logic in the explicit measures that are part of the Semantic Model.

Long story short, I think we should be able to do better! Therefore, I created a Notebook that helps you to setup Data Agents, collect additional information from your Semantic Model and populate that information automatically as AI notes to the Data Agent.

Read on for the notebook and some additional explanation.

Leave a Comment

Native Power BI Write-Back in Microsoft Fabric

Jon Vöge comes full-circle:

Three years ago, write-back to Power BI was my gateway into the Power BI community.

Power Apps embedded into Power BI, enabling write-back to Sharepoint, Azure SQL and Fabric, and sharing those solutions with the community, have always been some of the most fun I’ve had with “work”.

However.

While Power Apps are relatively easy to build, the solution architecture quickly becomes complex. Especially when you consider governance, CI/CD and licensing, all of which balloons in size when you are forced to integrate with a new platform (Dataverse/Power Platform) to solve a seemingly small issue in a Power BI report.

Click through to see the new way to do this. It’s been a point of frustration for me that, for so long, it has been such a challenge to allow a user to annotate or augment data in Power BI.

Leave a Comment

Writing a Python Data Frame to a Lakehouse Table

Gilbert Quevauvilliers continues a series on Python notebooks and DuckDB:

In this blog post I am going to explain how to loop through a data frame to query data and write once to a Lakehouse table.

The example I will use is to loop through a list of dates which I get from my date table, then query an API, append to an existing data frame and finally write once to a Lakehouse table.

Click through for the code, as well as a sample notebook you can use.

Leave a Comment

What’s New in Apache Spark 4.0

Ram Ghadiyaram looks at recent updates to Apache Spark:

Hurray! Apache Spark 4.0, released in 2025, redefines big data processing with innovations that enhance performance, accessibility, and developer productivity. With contributions from over 400 developers across organizations like Databricks, Apple, and NVIDIA, Spark 4.0 resolves thousands of JIRA issues, introducing transformative features: native plotting in PySpark, Python Data Source API, polymorphic User-Defined Table Functions (UDTFs), state store enhancements, SQL scripting, and Spark Connect improvements. This report provides an in-depth exploration of these features, their technical underpinnings, and practical applications through original examples and diagrams.

Click through to see what’s on the list of major features.

Leave a Comment

Apache Spark 3.5 Support in Azure Synapse Analytics

Arshad Ali has an announcement:

You can now create Azure Synapse Runtime for Apache Spark 3.5. The essential changes include features which come from upgrading Apache Spark to version 3.5 and Delta Lake 3.2. Please review the official release notes for Apache Spark 3.5 to check the complete list of fixes and features. In addition, review the migration guidelines between Spark 3.4 and 3.5 to assess potential changes to your applications, jobs and notebooks. 

Credit where credit is due: I’ve made light of the utter lack of work on Azure Synapse Analytics since Microsoft Fabric’s release. But hey, they did a thing. Granted, the impetus behind this was to “prepare for migrating to Microsoft Fabric Spark.”

Leave a Comment

SQL Server Performance Office Hours

Erik Darling is back with a new episode of office hours:

Do you know of any disadvantages of using a filtered index to filter NULL values? We have a very heavy transactional table, like 10k trans/sec, with a clustered index and one non-clustered index. We don’t have any queries that select rows with NULL values ​​from this table. The DBA team said we should avoid using a filtered index without any proof. What do you think?

Click through for Erik’s answers in video form. I was workshopping a joke around how all of the evidence Erik has of me being mean to him are lies, but couldn’t make it work without riding the line of “Wait…is he serious?”

Leave a Comment

Controlling Selections in Calculation Groups

Marco Russo and Alberto Ferrari looks at calculation groups:

Calculation groups are often used to display options in a report to change the calculation of existing measures by selecting items on a slicer. However, only a single calculation item can be executed for a measure reference, which could make the semantic model harder to use when the user selects two or more items in a calculation group.

Two new calculation group properties, multipleOrEmptySelectionExpression and noSelectionExpression, provide a way to control the calculation in these conditions that, so far, ignored the presence of the calculation group, thus executing the measures without applying any transformation. This article shows how to use these features and provides guidance on using the feature in preview: despite not having a user interface to manage these new properties, the TMDL view in Power BI Desktop and external tools like Tabular Editor already allow you to create and publish a semantic model that uses these new properties.

Read on to see how these properties work.

Leave a Comment