Press "Enter" to skip to content

Category: Performance Tuning

An Overview of the Fabric Native Execution Engine

Ankita Victor-Levi introduces a new processing model:

In today’s data landscape, as organizations scale their analytical workloads, the demand for faster, more cost-efficient computation continues to rise. Apache Spark has long been the backbone of largescale data processing with its in‑memory processing and powerful APIs, but today’s workloads demand even better performance.

Microsoft Fabric addresses this challenge with the Native Execution Engine—a vectorized, C++ powered execution layer that accelerates Spark jobs with no code changesreduced runtime, and at no additional compute cost. This blog post will take you behind the scenes to give an overview of how the engine works and how it delivers performance gains while preserving the familiar Spark developer experience users already know and love.

Read on to learn more about its capabilities and current limitations.

Leave a Comment

Spark Schema Inference in Production

Miles Cole shares some advice:

To show the impact I want to highlight a benchmark that included Fabric Spark on a single 19GB CSV input file (100M Contoso dataset, sales table) for the benchmark. While there were a number of issue with this benchmark that inadvertently make Spark appear to be slow, this is only focused on the impact of inferring schema and practical recommendations.

Read on to see a performance problem that schema inference brings up. I’d also want to mention the risk of data updates blowing up your well-laid plans as a risk. Schema inference is a double-edged sword: it can be convenient and open up new approaches to development, but can just as easily cause unexpected failures.

Leave a Comment

Measuring Page Load Times in Power BI

Chris Webb breaks out the stopwatch:

If you’re performance tuning a Power BI report the most important thing you need to measure – and the thing your users certainly care about most – is how long it takes for a report page to load. Yet this isn’t something that is available anywhere in Power BI Desktop or in the Service (though you can use browser dev tools to do this) and developers often concentrate on tuning just the individual DAX queries generated by the report instead. Usually that’s all you need to do but running multiple DAX queries concurrently can affect the performance of each one, and there are other factors (for example geocoding in map visuals or displaying images) that affect report performance so if you do not look at overall page render times then you might miss them. In this post I’ll show you how you can measure report page load times, and the times taken for other forms of report interaction, using Performance Analyzer in the Service and Power Query.

Read on to see how.

Leave a Comment

Occasional Query Failures on a Small Table

Paul Randal troubleshoots an issue:

The table in question only had a few million rows of data in it, with a maximum row size of 60 bytes, and the query usually ran in a few seconds, but occasionally the query would ‘hang’ and would either be killed or take tens of minutes to run. Troubleshooting instrumentation when the issue happened showed no out-of-the-ordinary waits occurring, no pressure on the server, and the query plan generated when the query took a long time was essentially the same.

The only thing noticeable was that when the problem occurred, a column statistics update happened as part of query compilation, but with such a tiny table, how could that be the root cause of the issue? The calculated disk space for the row size and count worked out to be about 250MB, but with a statistics sample rate of only 4%, extended events showed an auto_stats event taking close to an hour!

Read on to learn the cause. I will admit that I did not get this one correct when I guessed what the cause could be.

Leave a Comment

Transaction ID Locking

Hugo Kornelis disentangles two new features in SQL Server 2025:

One of these two features is Transaction ID (TID) Locking. Slated to end the memory waste of thousands of individual row locks, and the concurrency killer of lock escalation. What it is, how does it work, what are the limitations, and do we really get a free lunch?

Click through for the video, though I am firmly wedded to the idea that TANSTAAFL. I say this without spoiling any part of the video.

Leave a Comment

Sub-Transactions and PostgreSQL Performance

Shane Borden lays out an argument:

The short answer is always “maybe”. However, in the following post, I hope to demonstrate what creates a sub-transactions and what happens to the overall transaction id utilization when they are invoked. I will also show how performance is affected when there are lots of connections creating and consuming sub-transactions.

Click through for some testing and results.

Leave a Comment

Reducing Row Count Estimation Errors in PostgreSQL

Shinya Kato lays out four approaches:

PostgreSQL’s query planner relies on table statistics to estimate the number of rows (estimated rows) each operation will process, and then selects an optimal execution plan based on these estimates. When the estimated rows diverge significantly from the actual rows, the planner can choose a suboptimal plan, leading to severe query performance degradation.

This article walks through four approaches I used to reduce row count estimation errors, ordered from least to most invasive. Due to confidentiality constraints, I cannot share actual SQL or execution plans, so the focus is on the diagnostic thought process and the techniques applied.

Click through for those thought processes.

Comments closed

Testing the Performance of Direct Lake vs Import Mode for Semantic Models

Gilbert Quevauvilliers performs some testing:

In this blog post I am going to show you how I completed the automated testing and then the results where I am going to compare Direct Lake, Import and DirectQuery and which one appears to be the best.

As always, your testing may very or be different to my tests below.

I would highly recommend that you use the method I have used and apply this testing technique to your data to understand and validate which semantic model would be best for your capacity.

Click through for details on the tests, query durations, and how the three major modes of data loading into Microsoft Fabric semantic models (Import, Direct Lake, Direct Query) fare.

Comments closed

Dealing with Long-Running I/O Requests in SQL Server

Rebecca Lewis has a two-parter. First up is finding instances of long-running I/O Requests:

When diagnosing storage or latency issues, one SQL Server message factors in more than many:

“SQL Server has encountered X occurrence(s) of I/O requests taking longer than 15 seconds to complete on file…”

Where X might be 1, 5 or 50, and it could list a file from any one of your databases. When you see this, the next good question is when did it happen and where.

And then the question is, what do you do about it? Rebecca provides some guidance:

In a previous post, I shared a script to detect the I/O requests taking longer than 15 seconds warning across your SQL Server inventory.  Now let’s talk about what to do when you find it.

Here are five of the most common causes with some tips to investigate each:

The neat part is, it’s not always due to slow storage or bad hardware.

Comments closed

Measuring Time to Display an Image in Power BI

Chris Webb breaks out the stopwatch:

Carrying on my series on troubleshooting Power BI performance problems with Performance Analyzer, another situation where a report may be slow even when the DAX queries it generates against the underlying semantic model are fast is when you have large images displayed in an Image visual. Let’s see an example.

Click through for that example. And maybe don’t plop in so many 25 MB images.

Comments closed