Press "Enter" to skip to content

Month: June 2025

Simplifying Calculations with the APPLY Operator

I have a new video:

In this video, I show how we can use the APPLY operator to remove redundancy in the SELECT clause and simplify complex calculations, all with zero performance impact.

This is, as I’ve said in the past, my favorite use case for the APPLY operator. As I’ve become older and (even) more crochety, I’ve sided more and more with “easy to read” versus “runs faster” for code. And when you get “easy to read” with no impact on “runs faster,” I’m all in.

The accounting scenario I show may be a fairly extreme case, but I’d bet that queries similar to this abound in almost any company. A big part of why complex reporting queries are so complex comes from repetition of calculations.

Comments closed

Invoking Child Pipelines in Microsoft Fabric

Meagan Longoria spots the fork in the road:

At the moment there are two activities in Fabric pipelines that allow you to execute a “child” pipeline. They are both named “Invoke Pipeline” but are differentiated by the labels “Legacy” and “Preview” in parentheses.

Read on to learn more about these two and why choosing the new one may not always be the best option for you, at least not yet.

Comments closed

Populating Microsoft Fabric Data Agents with Semantic Model Synonyms

Marc Lelijveld explains some terms:

It was only yesterday, that I wrote a blog post on Semantic Models as a source for Fabric Data Agents. Not much time has passed, since I learned that Fabric Data Agents does not (always) respect the Synonyms that have been added to a Semantic Model. As a result, the Data Agent may start creating implicit measures, not respecting the definitions and logic in the explicit measures that are part of the Semantic Model.

Long story short, I think we should be able to do better! Therefore, I created a Notebook that helps you to setup Data Agents, collect additional information from your Semantic Model and populate that information automatically as AI notes to the Data Agent.

Read on for the notebook and some additional explanation.

Comments closed

Native Power BI Write-Back in Microsoft Fabric

Jon Vöge comes full-circle:

Three years ago, write-back to Power BI was my gateway into the Power BI community.

Power Apps embedded into Power BI, enabling write-back to Sharepoint, Azure SQL and Fabric, and sharing those solutions with the community, have always been some of the most fun I’ve had with “work”.

However.

While Power Apps are relatively easy to build, the solution architecture quickly becomes complex. Especially when you consider governance, CI/CD and licensing, all of which balloons in size when you are forced to integrate with a new platform (Dataverse/Power Platform) to solve a seemingly small issue in a Power BI report.

Click through to see the new way to do this. It’s been a point of frustration for me that, for so long, it has been such a challenge to allow a user to annotate or augment data in Power BI.

Comments closed

Writing a Python Data Frame to a Lakehouse Table

Gilbert Quevauvilliers continues a series on Python notebooks and DuckDB:

In this blog post I am going to explain how to loop through a data frame to query data and write once to a Lakehouse table.

The example I will use is to loop through a list of dates which I get from my date table, then query an API, append to an existing data frame and finally write once to a Lakehouse table.

Click through for the code, as well as a sample notebook you can use.

Comments closed

What’s New in Apache Spark 4.0

Ram Ghadiyaram looks at recent updates to Apache Spark:

Hurray! Apache Spark 4.0, released in 2025, redefines big data processing with innovations that enhance performance, accessibility, and developer productivity. With contributions from over 400 developers across organizations like Databricks, Apple, and NVIDIA, Spark 4.0 resolves thousands of JIRA issues, introducing transformative features: native plotting in PySpark, Python Data Source API, polymorphic User-Defined Table Functions (UDTFs), state store enhancements, SQL scripting, and Spark Connect improvements. This report provides an in-depth exploration of these features, their technical underpinnings, and practical applications through original examples and diagrams.

Click through to see what’s on the list of major features.

Comments closed

Apache Spark 3.5 Support in Azure Synapse Analytics

Arshad Ali has an announcement:

You can now create Azure Synapse Runtime for Apache Spark 3.5. The essential changes include features which come from upgrading Apache Spark to version 3.5 and Delta Lake 3.2. Please review the official release notes for Apache Spark 3.5 to check the complete list of fixes and features. In addition, review the migration guidelines between Spark 3.4 and 3.5 to assess potential changes to your applications, jobs and notebooks. 

Credit where credit is due: I’ve made light of the utter lack of work on Azure Synapse Analytics since Microsoft Fabric’s release. But hey, they did a thing. Granted, the impetus behind this was to “prepare for migrating to Microsoft Fabric Spark.”

Comments closed

SQL Server Performance Office Hours

Erik Darling is back with a new episode of office hours:

Do you know of any disadvantages of using a filtered index to filter NULL values? We have a very heavy transactional table, like 10k trans/sec, with a clustered index and one non-clustered index. We don’t have any queries that select rows with NULL values ​​from this table. The DBA team said we should avoid using a filtered index without any proof. What do you think?

Click through for Erik’s answers in video form. I was workshopping a joke around how all of the evidence Erik has of me being mean to him are lies, but couldn’t make it work without riding the line of “Wait…is he serious?”

Comments closed

Controlling Selections in Calculation Groups

Marco Russo and Alberto Ferrari looks at calculation groups:

Calculation groups are often used to display options in a report to change the calculation of existing measures by selecting items on a slicer. However, only a single calculation item can be executed for a measure reference, which could make the semantic model harder to use when the user selects two or more items in a calculation group.

Two new calculation group properties, multipleOrEmptySelectionExpression and noSelectionExpression, provide a way to control the calculation in these conditions that, so far, ignored the presence of the calculation group, thus executing the measures without applying any transformation. This article shows how to use these features and provides guidance on using the feature in preview: despite not having a user interface to manage these new properties, the TMDL view in Power BI Desktop and external tools like Tabular Editor already allow you to create and publish a semantic model that uses these new properties.

Read on to see how these properties work.

Comments closed

Date and Time Data Types in MySQL and PostgreSQL

Aisha Bukar compares and contrasts:

MySQL and PostgreSQL offer several data types that can be used for handling dates and times. These data types provide the tools to store and manage information like dates of a particular event, timestamps, and even time durations. While they both share some similarities on how they handle date and time, there are key differences in how they handle precision, time zones, and date/time calculations.

Getting date and time data right is key for keeping databases accurate and useful. In this article, we will compare how MySQL and PostgreSQL handle date and time data, their differences, strengths, and which one might work better for your needs. By the end, you’ll have a clearer idea of which database to choose for managing date and time information.

Click through to learn about the two platforms.

Comments closed