Category: Performance Tuning

Scalar Function Blocking

Published 2018-04-27 by Kevin Feasel

Erik Darling notes that scalar functions can cause multi-table blocking:

Someone had tried to be clever. Looking at the code running, if you’ve been practicing SQL Server for a while, usually means one thing.

A Scalar Valued Function was running!

In this case, here’s what it looked like:

1

2

3

4

5

6

7

8

9

10

11

12

CREATE OR ALTER FUNCTION dbo.BadIdea ( @uid INT )

RETURNS BIGINT

WITH RETURNS NULL ON NULL INPUT, SCHEMABINDING

AS

    BEGIN

        DECLARE @BCount BIGINT;

        SELECT   @BCount = COUNT_BIG(*)

        FROM     dbo.Badges AS b

        WHERE    b.UserId = @uid

        GROUP BY b.UserId;

        RETURN @BCount;

    END;

Someone had added that function as a computed column to the Users table:

1	ALTER TABLE dbo.Users ADD BadgeCount AS dbo.BadIdea(Id);

Spoilers: this was a bad idea.

Comments closed

Row-By-Row Is Slow-By-Slow

Published 2018-04-24 by Kevin Feasel

Lukas Eder points out that row-by-row updates are a great way of slowing down your system:

The best way to find out is to benchmark. I’m doing two benchmarks for this:

One that is run in PL/SQL, showing the performance difference between different approaches that are available to PL/SQL (namely looping, the FORALL syntax, and a single bulk UPDATE)
One that is run in Java, doing JDBC calls, showing the performance difference between different approaches available to Java (namely looping, caching PreparedStatement but still looping, batching, and a single bulk UPDATE)

The results tend to be even more dramatic on SQL Server, where the row-by-row overhead is even greater.

Comments closed

Finding Queries Which Drive The Missing Index DMV

Published 2018-04-23 by Kevin Feasel

Daniel Janik shows how you can find which queries are causing you pain due to missing indexes:

Missing indexes are an important part of the indexing strategy. I usually start with sys.dm_db_index_usage_stats to find both inefficient and unused indexes and then supplement with missing indexes.

The missing index DMVs are great but they’ve always been missing something.

What are they missing you ask? They currently tell you what table they are for but not what query. How do I know if the queries that sponsored this missing index are business critical or not? Wouldn’t it be nice to know what statements caused this “missing index” to appear?

Read on to learn how to do this.

Comments closed

Auto Soft-NUMA And Scheduler Waits

Published 2018-04-17 by Kevin Feasel

Joe Obbish walks us through a scenario with automatic soft-NUMA leading to poor performance:

Consider a server with soft-NUMA nodes of 8 schedulers with MAXDOP 8. The first parallel query will be sent to numa node 0. The number of active workers matches the number of schedulers exactly so each active worker is assigned to a different scheduler in the NUMA node. The second parallel query will be sent to NUMA node 1. The third parallel query will be sent to NUMA node 2, and so on. Execution of serial queries or creation of sessions does not matter. That advances a counter that’s separate from the “global enumerator” used for parallel query scheduler placement. As far as I can tell the scheduler assigned to execution context 0 does not affect the scheduling of the parallel worker threads, although it can certainly affect parallel query performance.

The scenario described above doesn’t sound so bad. It can work well if the parallel queries take roughly about the same amount of time to complete and query MAXDOPmatches the number of schedulers per soft-NUMA node. Problems can emerge when at least one of those is not true. With the spread selection type it’s possible that the amount of work already assigned to schedulers has no effect on parallel query scheduler placement. Let that sink in. You could have 100 serial queries all assigned to schedulers in numa node 0 but SQL Server may send a parallel query to that NUMA node. It depends on the position of the “global enumerator” as opposed to current work on the server.

Joe offers up some alternatives if you find yourself dealing with this issue. Definitely a must-read.

Comments closed

Optimizing SSIS Throughput With Buffer Properties

Published 2018-04-16 by Kevin Feasel

Andy Leonard explains how he uses data flow properties to tune SQL Server Integration Services package performance:

I started answering a question on SQL Community Slack’s #ssis channel and I realized this would be better served as a blog post. The question was about three SSIS Data Flow properties: DefaultBufferSize, Engine Thread and DefaultBufferMaxRows.

I rarely change the EngineThreads property.

DefaultBufferSize and DefaultBufferMaxRows are two ways of managing the size limits of a Data Flow buffer. The two Data Flow Task properties can – and should – be treated as a single property. DefaultBufferSize is the number of bytes per buffer. DefaultBufferMaxRows is the number of rows per buffer. The defaults are 10,485,760 (10M) and 10,000, respectively.

Click through to learn more about these properties.

Comments closed

Reverse Engineering The Stream Aggregate Algorithm

Published 2018-04-12 by Kevin Feasel

Itzik Ben-Gan has started a series of articles on optimizing queries which use grouping and aggregating with a reverse-engineering of the stream aggregate algorithm:

As you may already know, when SQL Server optimizes a query, it evaluates multiple candidate plans, and eventually picks the one with the lowest estimated cost. The estimated plan cost is the sum of all the operators’ estimated costs. In turn, each operator’s estimated cost is the sum of the estimated I/O cost and estimated CPU cost. The cost unit is meaningless in its own right. Its relevance is in the comparison that the optimizer makes between candidate plans. That is, the costing formulas were designed with the goal that, between candidate plans, the one with the lowest cost will (hopefully) represent the one that will finish more quickly. A terribly complex task to do accurately!

The more the costing formulas adequately take into account the factors that truly affect the algorithm’s performance and scaling, the more accurate they are, and the more likely that given accurate cardinality estimates, the optimizer will choose the optimal plan. At any rate, if you want to understand why the optimizer chooses one algorithm versus another you need to understand two main things: one is how the algorithms work and scale, and another is SQL Server’s costing model.

So back to the plan in Figure 1; let’s try and understand how the costs are computed. As a policy, Microsoft will not reveal the internal costing formulas that they use. When I was a kid I was fascinated with taking things apart. Watches, radios, cassette tapes (yes, I’m that old), you name it. I wanted to know how things were made. Similarly, I see value in reverse engineering the formulas since if I manage to predict the cost reasonably accurately, it probably means that I understand the algorithm well. During the process you get to learn a lot.

Our query ingests 1,000,000 rows. Even with this number of rows, the I/O cost seems to be negligible compared to the CPU cost, so it is probably safe to ignore it.

As for the CPU cost, you want to try and figure out which factors affect it and in what way.

I give this my highest recommendation.

Comments closed

The Value Of Live Query Stats In SSMS

Published 2018-04-10 by Kevin Feasel

Rob Farley exlaims his appreciation of Live Query Stats in SQL Server Management Studio:

I wrote about Live Query Statistics within SSMS a while back – and even presented at conferences about how useful it is for understanding how queries run…

…but what I love is that at customers where I have long-running queries to deal with, I can keep an eye on the queries as they execute. I can see how the Actuals are forming, and quickly notice whether the iterations in a Nested Loop are going to be unreasonable, or whether I’m happy enough with things. I don’t always want to spend time tuning a once-off query, but if I run it with LQS turned on, I can easily notice if it’s going to be ages before I see any rows back, see any blocking operators that are going to frustrate me, and so on.

I don’t use it often, but when I do, I typically learn something interesting about the query I’m running.

Comments closed

Monitoring Performance Of Natively Compiled Stored Procedures

Published 2018-04-09 by Kevin Feasel

Jos de Bruijn announces a feature coming to the next version of SQL Server:

We just added new database-scoped configuration options that will help with monitoring performance of natively compiled stored procedures. The new options XTP_PROCEDURE_EXECUTION_STATISTICS and XTP_QUERY_EXECUTION_STATISTICS are available now in Azure SQL Database, and will be available in the next major release of SQL Server. These options will improve your monitoring and troubleshooting experience for databases leveraging In-Memory OLTP with natively compiled stored procedures.

After enabling these options, you can monitor the performance of natively compiled stored procedures using Query Store, as well as the DMVs sys.dm_exec_query_stats and sys.dm_exec_procedure_stats. Note that there is a performance impact to enabling execution statistics collection, thus we recommend to disable stats collection when not needed.

That last sentence is important: there’s an observer effect which slows down execution of natively compiled stored procedures, and considering that you’re implementing them specifically for the speed, that’s fairly unwelcome.

Comments closed

Tuning Recommendations In SQL Server 2017

Published 2018-04-03 by Kevin Feasel

Kendra Little shows that even if you don’t want to use automatic tuning in SQL Server 2017, you can still see the tuning recommendations:

Even though automatic tuning wasn’t enabled, SQL Server picked up on the performance changes. I got a recommendation in sys.dm_db_tuning_recommendations.

reason: Average query CPU time changed from 2127.84ms to 66291.9ms

state: {“currentValue”:”Active”,”reason”:”AutomaticTuningOptionNotEnabled”}

The details also include the query id in question, and the plan_id of the “fast plan”.

It’s nice to check those out for a couple of weeks before turning automatic tuning on; that way, you can get more comfortable with the types of changes the tuning engine recommends, and if you happen to have a system which is terrible for automatic tuning, you can know that before turning the feature on.

Comments closed

More Fun With Row Goals

Published 2018-03-30 by Kevin Feasel

Joe Obbish has a detailed investigation into performance of a simple-enough query:

Something looks very wrong here. The loop join plan has a significantly lower cost than the hash join plan! In fact, the loop join plan has a total cost of 0.0167621 optimizer units. Why would disabling row goals for such a plan cause a decrease in total query cost?

I uploaded the estimated plans here for those who wish to examine them without going through the trouble of creating tables.

It’s a long but worthwhile read.

Comments closed