Press "Enter" to skip to content

Category: Performance Tuning

Comparing Spark Application Performance in Microsoft Fabric

Jenny Jiang announces a new capability:

The Spark Applications Comparison feature is now in preview in Microsoft Fabric. This new capability empowers developers and data engineers to analyze, debug, and optimize Spark performance across multiple application runs—whether you’re tracking changes from code updates or data variations to improve performance.

The image in the blog post is pretty small and hard to read, but I do wonder if (or how well) it will capture cases where you’re twiddling your thumbs to get a machine so that you can execute your code. This seems to be a big problem sometimes.

Comments closed

More Fun with Page Latches

Jared Poche continues a series on page latches:

In my previous blog, I set up a database with two tables, one with a large CHAR(8000) field and one with a smaller VARCHAR(100) field. Both tables use an INT IDENTITY column for their primary key. Since we’ll be inserting rows sequentially, we will see page latch contention when multiple threads attempt to insert.

We ran some initial tests with SQLQueryStress to create some page latch contention and resolved an odd problem causing connection delays.

We’ll use these two tables and test several different approaches to reduce page latch contention.

Jared shows the results for a variety of different tests and even has an embedded Excel spreadsheet, which is how you know he’s done his homework.

Comments closed

The Downside of Sticking to the Legacy Cardinality Estimator

Stephen Planck recommends taking the plunge:

Cardinality estimation (CE) is how the optimizer predicts the number of rows that will flow through each operator in a plan. Those estimates drive cost, join choices, memory grants, and ultimately latency and resource usage. SQL Server has shipped multiple CE models over time. The pre-2014 model—commonly called the legacy CE—dates back to SQL Server 7.0. Starting in SQL Server 2014, Microsoft introduced a new CE and has continued refining it in later releases, including SQL Server 2022. Keeping the legacy CE turned on in SQL Server 2022 is usually the wrong long-term choice.

One thing to note is that the “new” cardinality estimator has been out for a decade. It’s not really that new anymore, and it’s not going anywhere soon. Yeah, there are still trade-offs where some queries are better on the legacy estimator, but for the people who use that as their reason for not using the new estimator, what have you done in the past decade to address this and tune those queries to work better? If the answer is “nothing,” it’s not the cardinality estimator’s fault here.

1 Comment

Ordered Insert Optimization in OrioleDB

Alexander Korotkov deals with hot page issues:

When many sessions try to insert into the same B-tree leaf page, classic exclusive page locking serializes progress and wastes time on sleep/wake cycles. We’re introducing a batch page insertion path that lets the session holding the page lock insert for itself and its neighbors. The result: dramatically reduced lock waits, and big gains at high client counts (2X throughput boost starting from 64 clients in our benchmark).

Click through to see how it differs from normal PostgreSQL behavior on what is one of the more common performance problems on busy OLTP systems.

Comments closed

Troubleshooting Power BI Report Performance

Ben Richardson speeds up a report:

Slow Power BI reports waste time and frustrate users.

The slowdown often comes from hidden issues in the data model, DAX, visuals, or refresh settings.

This guide shows you how to find the cause, fix it, and keep reports running smoothly.

Read on to see some of the built-in tooling for performance optimization, as well as more information on these four common issues.

Comments closed

An Introduction to Query Folding in Power BI

Alex Powers takes us through one major performance optimization technique in Power BI:

One of the most powerful capabilities of Power Query and the M Language is Query Folding (also referred to as query delegation, and predicate push-down). Query Folding allows the Power Query Mashup Engine to push the transformations expressed in an M (mashup) query to the data source, in the data source’s query language, resulting in more efficient data processing.

For inexperienced database technology users this ability to leverage the graphical user interface of Power Query to dynamically generate a query written in the data source’s query language unlocks enormous opportunities to find insights with any data, at any scale.

Click through to see how you can know if query folding is enabled, as well as some hints around when and to what extent query folding will work.

Comments closed

Tips for Solving SSIS Package Bottlenecks

Andy Brownsword has some advice:

Last time out we started to look at optimising SSIS packages by showing how to identify bottlenecks with a handy script. This time we’re turning insights into action to solve those pain points.

The solutions are grouped into 3 areas: Data Flows, as they do a lot of heavy lifting; the Execute SQL task, which can also be used for transformation and calculations; and finally everything else (because the first two are usually the issue).

Andy has some good advice and plenty of links to prior content around optimizing SSIS performance. One small thing I’d add is architectural: think about whether you can solve the slow part inside SQL Server. If you’re grabbing a huge amount of data from a SQL Server instance and then narrowing it down with filters, it might be a lot faster to transform that into a SQL query with a stronger WHERE clause. But let’s say there’s some small file you’re using to filter, so you need to pull all of the data out of SQL Server to compare against the small file so that you know what you need. Instead of pulling all of the data out of SQL Server or setting up a Lookup component to hit the SQL Server instance for each row in the file, how about loading that file into SQL Server and then writing a query to do the work?

In short, the database engine is typically going to be a much better at performance than an integration layer would be.

Comments closed

Tips for Troubleshooting PostgreSQL Performance Slowdowns

Umair Shahid shares a few tips with us:

If you are a technology leader overseeing a team of developers who manage PostgreSQL as part of a broader application stack, or you are responsible for uptime and customer satisfaction at scale, knowing where to look first can make all the difference.

Let us walk through a focused checklist of patterns and places that commonly hold the key to unlocking better PostgreSQL performance.

This is a very high-level set of reminders regarding where you should look, rather than a detailed troubleshooting guide. But sometimes, it’s good to have that reminder.

Comments closed