Press "Enter" to skip to content

Category: Performance Tuning

The SQL Server Execution Plan Reference

Hugo Kornelis has embarked on a major project:

I didn’t choose the term “Execution Plan Reference” by accident. The core of the EPR will be a full description of all that is known about every operator known to exist in execution plans: what it does, how it functions, what properties it can have, how those properties affect its behavior, and any additional information that may be relevant to understand the operator. This section will be one page for each operator. Of course, some operators are fairly simple while others are more complex, so those pages will vary in length.

Apart from that core content, I planned some generic pages. It makes no sense to repeat the explanation for properties such as Estimated Number of Rows or Number of Executions on each operator’s page, so instead I wanted to have a single page to list and describe all common properties. I also wanted an introduction page that explains the basics of reading an execution plan, lists the properties for plans as a whole, and debunks some common misconceptions.

And there will be articles with additional background. Instead of having to explain what exactly an “anti semi join” is on each of the four pages for the four join operators, I decided to create a single page describing all the logical join types. When working on Hash Match, the page, was already very long and complex before I even had a chance to start on the details of the “dynamic destaging” process that handles memory spills, so I decided to leave that for a future page. As I continue to work on the EPR, I will probably continue to create or plan separate pages for content that applies to multiple operators, or that I consider too deep and too advanced for the operator’s normal page.

This is a huge undertaking, but even in its current state, the Execution Plan Reference looks great and has tremendous potential.

Comments closed

What TDE Does To Query Performance

Matthew McGiffen has a few tests on using Transparent Data Encryption:

By the time it had been executed 5 times (with the memory flushed between each execution) each query read about 600,000 pages sized at 8kb each – just under 5GB. If it took 50 seconds on the decryption of those pages, then each page took about 1 twelfth of a milli-second to decrypt – or alternatively, TDE decrypted about 12 pages per millisecond. Or in terms of disk size, 100MB per second. These were tests on a server with magnetic spinning disks (not SSDs) and you can see from the above figures, the straight disk access took about 40 seconds on its own.

When TDE doesn’t read from disk it doesn’t add any overhead, but how do we quantify what the overhead to queries is when it does have to access the disk?

Matthew has some good advice here, and I’d be willing to say that his experience is within the norm for TDE and doesn’t directly contradict general guidelines by enough to shift priors.

Comments closed

Views Don’t Improve Performance

Grant Fritchey lays down the law on views:

One day, it’s going to happen. I’m going to hear some crazy theory about how SQL Server works and I’m going to literally explode. Instead of some long silly rant with oddball literary & pop culture references you’ll get a screed the size of Ulysses (and about as much fun to read). However, for the moment, like Robin Williams describing a dance move, I’m going to keep it all inside. Here’s our query:

No, no where clause because we have to compare this to this, our view:

Grant used up much of his strategic reserve of GIFs in that post, so check it out.

Comments closed

Scalar Function Blocking

Erik Darling notes that scalar functions can cause multi-table blocking:

Someone had tried to be clever. Looking at the code running, if you’ve been practicing SQL Server for a while, usually means one thing.

A Scalar Valued Function was running!

In this case, here’s what it looked like:

Someone had added that function as a computed column to the Users table:

Spoilers:  this was a bad idea.

Comments closed

Row-By-Row Is Slow-By-Slow

Lukas Eder points out that row-by-row updates are a great way of slowing down your system:

The best way to find out is to benchmark. I’m doing two benchmarks for this:

  1. One that is run in PL/SQL, showing the performance difference between different approaches that are available to PL/SQL (namely looping, the FORALL syntax, and a single bulk UPDATE)

  2. One that is run in Java, doing JDBC calls, showing the performance difference between different approaches available to Java (namely looping, caching PreparedStatement but still looping, batching, and a single bulk UPDATE)

The results tend to be even more dramatic on SQL Server, where the row-by-row overhead is even greater.

Comments closed

Finding Queries Which Drive The Missing Index DMV

Daniel Janik shows how you can find which queries are causing you pain due to missing indexes:

Missing indexes are an important part of the indexing strategy. I usually start with sys.dm_db_index_usage_stats to find both inefficient and unused indexes and then supplement with missing indexes.

The missing index DMVs are great but they’ve always been missing something.

What are they missing you ask? They currently tell you what table they are for but not what query. How do I know if the queries that sponsored this missing index are business critical or not? Wouldn’t it be nice to know what statements caused this “missing index” to appear?

Read on to learn how to do this.

Comments closed

Auto Soft-NUMA And Scheduler Waits

Joe Obbish walks us through a scenario with automatic soft-NUMA leading to poor performance:

Consider a server with soft-NUMA nodes of 8 schedulers with MAXDOP 8. The first parallel query will be sent to numa node 0. The number of active workers matches the number of schedulers exactly so each active worker is assigned to a different scheduler in the NUMA node. The second parallel query will be sent to NUMA node 1. The third parallel query will be sent to NUMA node 2, and so on. Execution of serial queries or creation of sessions does not matter. That advances a counter that’s separate from the “global enumerator” used for parallel query scheduler placement. As far as I can tell the scheduler assigned to execution context 0 does not affect the scheduling of the parallel worker threads, although it can certainly affect parallel query performance.

The scenario described above doesn’t sound so bad. It can work well if the parallel queries take roughly about the same amount of time to complete and query MAXDOPmatches the number of schedulers per soft-NUMA node. Problems can emerge when at least one of those is not true. With the spread selection type it’s possible that the amount of work already assigned to schedulers has no effect on parallel query scheduler placement. Let that sink in. You could have 100 serial queries all assigned to schedulers in numa node 0 but SQL Server may send a parallel query to that NUMA node. It depends on the position of the “global enumerator” as opposed to current work on the server.

Joe offers up some alternatives if you find yourself dealing with this issue.  Definitely a must-read.

Comments closed

Optimizing SSIS Throughput With Buffer Properties

Andy Leonard explains how he uses data flow properties to tune SQL Server Integration Services package performance:

I started answering a question on SQL Community Slack’s #ssis channel and I realized this would be better served as a blog post. The question was about three SSIS Data Flow properties: DefaultBufferSize, Engine Thread and DefaultBufferMaxRows.

I rarely change the EngineThreads property.

DefaultBufferSize and DefaultBufferMaxRows are two ways of managing the size limits of a Data Flow buffer. The two Data Flow Task properties can – and should – be treated as a single property. DefaultBufferSize is the number of bytes per buffer. DefaultBufferMaxRows is the number of rows per buffer. The defaults are 10,485,760 (10M) and 10,000, respectively.

Click through to learn more about these properties.

Comments closed

Reverse Engineering The Stream Aggregate Algorithm

Itzik Ben-Gan has started a series of articles on optimizing queries which use grouping and aggregating with a reverse-engineering of the stream aggregate algorithm:

As you may already know, when SQL Server optimizes a query, it evaluates multiple candidate plans, and eventually picks the one with the lowest estimated cost. The estimated plan cost is the sum of all the operators’ estimated costs. In turn, each operator’s estimated cost is the sum of the estimated I/O cost and estimated CPU cost. The cost unit is meaningless in its own right. Its relevance is in the comparison that the optimizer makes between candidate plans. That is, the costing formulas were designed with the goal that, between candidate plans, the one with the lowest cost will (hopefully) represent the one that will finish more quickly. A terribly complex task to do accurately!

The more the costing formulas adequately take into account the factors that truly affect the algorithm’s performance and scaling, the more accurate they are, and the more likely that given accurate cardinality estimates, the optimizer will choose the optimal plan. At any rate, if you want to understand why the optimizer chooses one algorithm versus another you need to understand two main things: one is how the algorithms work and scale, and another is SQL Server’s costing model.

So back to the plan in Figure 1; let’s try and understand how the costs are computed. As a policy, Microsoft will not reveal the internal costing formulas that they use. When I was a kid I was fascinated with taking things apart. Watches, radios, cassette tapes (yes, I’m that old), you name it. I wanted to know how things were made. Similarly, I see value in reverse engineering the formulas since if I manage to predict the cost reasonably accurately, it probably means that I understand the algorithm well. During the process you get to learn a lot.

Our query ingests 1,000,000 rows. Even with this number of rows, the I/O cost seems to be negligible compared to the CPU cost, so it is probably safe to ignore it.

As for the CPU cost, you want to try and figure out which factors affect it and in what way.

I give this my highest recommendation.

Comments closed

The Value Of Live Query Stats In SSMS

Rob Farley exlaims his appreciation of Live Query Stats in SQL Server Management Studio:

wrote about Live Query Statistics within SSMS a while back – and even presented at conferences about how useful it is for understanding how queries run…

…but what I love is that at customers where I have long-running queries to deal with, I can keep an eye on the queries as they execute. I can see how the Actuals are forming, and quickly notice whether the iterations in a Nested Loop are going to be unreasonable, or whether I’m happy enough with things. I don’t always want to spend time tuning a once-off query, but if I run it with LQS turned on, I can easily notice if it’s going to be ages before I see any rows back, see any blocking operators that are going to frustrate me, and so on.

I don’t use it often, but when I do, I typically learn something interesting about the query I’m running.

Comments closed