Performance Tuning – Page 6

This post will lean heavily on the First Responder Kit from Brent Ozar, sp_WhoIsActive from Adam Machanic and others. They took what Microsoft provides and made them better. Hundreds or thousands of hours of work to make FREE things for you to use.

This is the most complex blog I’ve ever written. Your experiences may differ, and my code samples might have some bugs. Test first.

Kevin is trying to sum up a lot of material in a blog post, but this serves as a pretty good starting point for people who don’t know where to begin. One of the key takeaways from the post, and the entire series, is to have a plan in place. Regardless of whether you use what Kevin has as-is or make your own modifications, knowing what to do and how to do it will save a lot of time and energy when you need to act.

Comments closed

Narrowing down Slowdown Causes in SQL Server

Published 2025-10-02 by Kevin Feasel

Kevin Hill continues a series on solving the age-old “The server is slow!” problem:

At this point you’ve:

Defined what “slow” means and built a timeline (Part 1).

Checked things outside SQL Server like network, storage, and VM noise (Part 2).

Now it’s time to open the hood on SQL Server itself.

I think Kevin’s checklist is a pretty solid one for the type of client he often deals with: one without an in-house DBA or the expertise to stay on top of server problems.

Comments closed

Splitting GUIDs into Multiple BIGINTs for Columnstore

Published 2025-09-30 by Kevin Feasel

Forrest McDaniel performs an experiment:

You may have run into issues with GUIDs as clustering keys, but another major problem is in columnstore. Smart people at Microsoft wrote columnstore in a way to take advantage of modern CPU features, but those CPU features don’t play well with datatypes larger than 8 bytes. Which includes GUIDs.

Read on for the demonstration of this, a clever workaround, and the ramifications of splitting GUIDs into two BIGINTs. Full points for cleverness, though like Forrest, I wouldn’t want to use this in production.

Comments closed

Scaling On-Prem Vector Search with Ollama and Nginx

Published 2025-09-29 by Kevin Feasel

Anthony Nocentino solves a problem:

When you call out to an external embedding service from T-SQL via REST over HTTPS, you’re limited by the throughput of that backend. If you’re running a single Ollama instance, you’ll quickly hit a ceiling on how fast you can generate embeddings, especially for large datasets. I recently attended an event and discussed this topic. My first attempt at generating embeddings was for a three-million-row table. I had access to some world-class hardware to generate the embeddings. When I arrived at the lab and initiated the embedding generation process for this dataset, I quickly realized it would take approximately 9 days to complete. Upon closer examination, I found that I was not utilizing the GPUs to their full potential; in fact, I was only using about 15% of one GPU’s capacity. So I started to cook up this concept in my head, and here we are, load balancing embedding generation across multiple instances of ollama to more fully utilize the resources.

Click through for the solution.

Comments closed

Memory Grant Feedback Woes

Published 2025-09-26 by Kevin Feasel

Rebecca Lewis explains an issue with memory grant feedback in SQL Server:

Before SQL Server runs a query, it estimates how much memory it needs for sorting and joining. But what if it gets it wrong?

Too little memory → Spills to tempdb (slow)

Too much memory → Starves other queries

SQL Server 2017+ tries to fix bad estimates based on previous calls with ‘Memory Grant Feedback’. Kinda like: ‘Last time I gave you 2GB but you only used 50MB. Next time I’m giving you less.’

This is great in theory, though Rebecca shows a case in which the end result might not be great.

Comments closed

Tracking Performance Problems with Windows Performance Recorder

Published 2025-09-25 by Kevin Feasel

Erik Darling digs into Windows Performance Recorder. This is definitely a more advanced tool for analysis, and Erik focuses on call stack time for two given queries. Windows Performance Recorder isn’t something I’ve really used before, though Erik’s demo does show some of the power of the tool.

Comments closed

Comparing Spark Application Performance in Microsoft Fabric

Published 2025-09-24 by Kevin Feasel

Jenny Jiang announces a new capability:

The Spark Applications Comparison feature is now in preview in Microsoft Fabric. This new capability empowers developers and data engineers to analyze, debug, and optimize Spark performance across multiple application runs—whether you’re tracking changes from code updates or data variations to improve performance.

The image in the blog post is pretty small and hard to read, but I do wonder if (or how well) it will capture cases where you’re twiddling your thumbs to get a machine so that you can execute your code. This seems to be a big problem sometimes.

Comments closed

More Fun with Page Latches

Published 2025-09-03 by Kevin Feasel

Jared Poche continues a series on page latches:

In my previous blog, I set up a database with two tables, one with a large CHAR(8000) field and one with a smaller VARCHAR(100) field. Both tables use an INT IDENTITY column for their primary key. Since we’ll be inserting rows sequentially, we will see page latch contention when multiple threads attempt to insert.

We ran some initial tests with SQLQueryStress to create some page latch contention and resolved an odd problem causing connection delays.

We’ll use these two tables and test several different approaches to reduce page latch contention.

Jared shows the results for a variety of different tests and even has an embedded Excel spreadsheet, which is how you know he’s done his homework.

Comments closed

The Downside of Sticking to the Legacy Cardinality Estimator

Published 2025-09-02 by Kevin Feasel

Stephen Planck recommends taking the plunge:

Cardinality estimation (CE) is how the optimizer predicts the number of rows that will flow through each operator in a plan. Those estimates drive cost, join choices, memory grants, and ultimately latency and resource usage. SQL Server has shipped multiple CE models over time. The pre-2014 model—commonly called the legacy CE—dates back to SQL Server 7.0. Starting in SQL Server 2014, Microsoft introduced a new CE and has continued refining it in later releases, including SQL Server 2022. Keeping the legacy CE turned on in SQL Server 2022 is usually the wrong long-term choice.

One thing to note is that the “new” cardinality estimator has been out for a decade. It’s not really that new anymore, and it’s not going anywhere soon. Yeah, there are still trade-offs where some queries are better on the legacy estimator, but for the people who use that as their reason for not using the new estimator, what have you done in the past decade to address this and tune those queries to work better? If the answer is “nothing,” it’s not the cardinality estimator’s fault here.

1 Comment

Ordered Insert Optimization in OrioleDB

Published 2025-08-20 by Kevin Feasel

Alexander Korotkov deals with hot page issues:

When many sessions try to insert into the same B-tree leaf page, classic exclusive page locking serializes progress and wastes time on sleep/wake cycles. We’re introducing a batch page insertion path that lets the session holding the page lock insert for itself and its neighbors. The result: dramatically reduced lock waits, and big gains at high client counts (2X throughput boost starting from 64 clients in our benchmark).

Click through to see how it differs from normal PostgreSQL behavior on what is one of the more common performance problems on busy OLTP systems.

Comments closed

Category: Performance Tuning

Using Community Tools to Troubleshoot SQL Server Performance

Narrowing down Slowdown Causes in SQL Server

Splitting GUIDs into Multiple BIGINTs for Columnstore

Scaling On-Prem Vector Search with Ollama and Nginx

Memory Grant Feedback Woes

Tracking Performance Problems with Windows Performance Recorder

Comparing Spark Application Performance in Microsoft Fabric

More Fun with Page Latches

The Downside of Sticking to the Legacy Cardinality Estimator

Ordered Insert Optimization in OrioleDB