Press "Enter" to skip to content

Category: Performance Tuning

Reverse Engineering The Stream Aggregate Algorithm

Itzik Ben-Gan has started a series of articles on optimizing queries which use grouping and aggregating with a reverse-engineering of the stream aggregate algorithm:

As you may already know, when SQL Server optimizes a query, it evaluates multiple candidate plans, and eventually picks the one with the lowest estimated cost. The estimated plan cost is the sum of all the operators’ estimated costs. In turn, each operator’s estimated cost is the sum of the estimated I/O cost and estimated CPU cost. The cost unit is meaningless in its own right. Its relevance is in the comparison that the optimizer makes between candidate plans. That is, the costing formulas were designed with the goal that, between candidate plans, the one with the lowest cost will (hopefully) represent the one that will finish more quickly. A terribly complex task to do accurately!

The more the costing formulas adequately take into account the factors that truly affect the algorithm’s performance and scaling, the more accurate they are, and the more likely that given accurate cardinality estimates, the optimizer will choose the optimal plan. At any rate, if you want to understand why the optimizer chooses one algorithm versus another you need to understand two main things: one is how the algorithms work and scale, and another is SQL Server’s costing model.

So back to the plan in Figure 1; let’s try and understand how the costs are computed. As a policy, Microsoft will not reveal the internal costing formulas that they use. When I was a kid I was fascinated with taking things apart. Watches, radios, cassette tapes (yes, I’m that old), you name it. I wanted to know how things were made. Similarly, I see value in reverse engineering the formulas since if I manage to predict the cost reasonably accurately, it probably means that I understand the algorithm well. During the process you get to learn a lot.

Our query ingests 1,000,000 rows. Even with this number of rows, the I/O cost seems to be negligible compared to the CPU cost, so it is probably safe to ignore it.

As for the CPU cost, you want to try and figure out which factors affect it and in what way.

I give this my highest recommendation.

Comments closed

The Value Of Live Query Stats In SSMS

Rob Farley exlaims his appreciation of Live Query Stats in SQL Server Management Studio:

wrote about Live Query Statistics within SSMS a while back – and even presented at conferences about how useful it is for understanding how queries run…

…but what I love is that at customers where I have long-running queries to deal with, I can keep an eye on the queries as they execute. I can see how the Actuals are forming, and quickly notice whether the iterations in a Nested Loop are going to be unreasonable, or whether I’m happy enough with things. I don’t always want to spend time tuning a once-off query, but if I run it with LQS turned on, I can easily notice if it’s going to be ages before I see any rows back, see any blocking operators that are going to frustrate me, and so on.

I don’t use it often, but when I do, I typically learn something interesting about the query I’m running.

Comments closed

Monitoring Performance Of Natively Compiled Stored Procedures

Jos de Bruijn announces a feature coming to the next version of SQL Server:

We just added new database-scoped configuration options that will help with monitoring performance of natively compiled stored procedures. The new options XTP_PROCEDURE_EXECUTION_STATISTICS and XTP_QUERY_EXECUTION_STATISTICS are available now in Azure SQL Database, and will be available in the next major release of SQL Server. These options will improve your monitoring and troubleshooting experience for databases leveraging In-Memory OLTP with natively compiled stored procedures.

After enabling these options, you can monitor the performance of natively compiled stored procedures using Query Store, as well as the DMVs sys.dm_exec_query_stats and sys.dm_exec_procedure_stats. Note that there is a performance impact to enabling execution statistics collection, thus we recommend to disable stats collection when not needed.

That last sentence is important:  there’s an observer effect which slows down execution of natively compiled stored procedures, and considering that you’re implementing them specifically for the speed, that’s fairly unwelcome.

Comments closed

Tuning Recommendations In SQL Server 2017

Kendra Little shows that even if you don’t want to use automatic tuning in SQL Server 2017, you can still see the tuning recommendations:

Even though automatic tuning wasn’t enabled, SQL Server picked up on the performance changes. I got a recommendation in sys.dm_db_tuning_recommendations.

  • reason: Average query CPU time changed from 2127.84ms to 66291.9ms
  • state: {“currentValue”:”Active”,”reason”:”AutomaticTuningOptionNotEnabled”}

The details also include the query id in question, and the plan_id of the “fast plan”.

It’s nice to check those out for a couple of weeks before turning automatic tuning on; that way, you can get more comfortable with the types of changes the tuning engine recommends, and if you happen to have a system which is terrible for automatic tuning, you can know that before turning the feature on.

Comments closed

More Fun With Row Goals

Joe Obbish has a detailed investigation into performance of a simple-enough query:

Something looks very wrong here. The loop join plan has a significantly lower cost than the hash join plan! In fact, the loop join plan has a total cost of 0.0167621 optimizer units. Why would disabling row goals for such a plan cause a decrease in total query cost?

I uploaded the estimated plans here for those who wish to examine them without going through the trouble of creating tables.

It’s a long but worthwhile read.

Comments closed

Function Calls Missing From dm_exec_query_stats

Kendra Little blogs about a data collection oddity with functions in SQL Server:

Some of my functions in the demo code were showing up just fine. I was really puzzled by that. I thought …

Maybe this is a bug with ‘CREATE OR ALTER’? A sign of some weird memory pressure? Something introduced in SQL Server 2017? A buggy side effect of implicit conversions in some of the functions? A problem with the queries I was using? A weird setting on the database? (Also: about 100 other things that didn’t turn out to be the case.)

I finally wrote up some simple demo code, tested it against a SQL Server 2008 R2 instance (omitting the Query Store components), compared it with SQL Server 2017, and found it to be consistent.

Click through to see which types of functions show up and which ones stay hidden.

Comments closed

Speeding Up Power BI Aggregations With Primary Keys

Chris Webb has an interesting use case for adding primary keys on lookup tables:

As you can see, the Property Type column from the #”Price Paid” query contains single letter codes describing the type of property sold in each transaction; the Property Type column from #“Property Types” contains a distinct list of the same codes and acts as a dimension table. Again there’s nothing interesting going on in this query.

The problems start when you try to join data from these two queries using a Merge and then, for each row in #”Property Types”, show the sum of the Price Paid column from #”Price Paid”.

Although baseline performance is bad, Chris shows a way of improving that performance significantly.

Comments closed

All Execution Plans Are Estimates

Grant Fritchey drops a bomb on us:

All these resources, yet, for any given query, all the plans will be identical (assuming no recompile at work). Why? Because they’re all the same plan. Each and every one of them is an estimated plan. Only an estimated plan. This is why the estimated costs stay the same between an estimated and actual plan, this despite any disparity between estimated and actual row counts.

I’ve blogged about this before, but it’s worth mentioning again. There are a only a few minor differences between an estimated plan and an actual plan. It’s all about the data set. What’s going on is that an actual plan can capture query metrics, which are then appended to the estimated plan. At no point is any different plan generated during this process. It’s just a plan, an estimated plan, or, it’s a plan plus query metrics.

Read the whole thing.

Comments closed

The Optimal Kafka Message Size

Guy Shilo wants to figure out the right chunk size for a Kafka message:

I wrote a python program that runs a producer and a consumer for 30 minutes with different message sizes and measures how many messages per second it can deliver, or the Kafka cluster throughput.

I did not care about the message content, so the consumer only reads the messages from the topic and then discards them. I used a Three partition topic. I guess that on larger clusters with more partitions the performance will be better, but the message size – throughput ratio will remain roughly the same.

So I wrote a small python program that generates a dummy message in the desired size, then spawns two threads, one is a producer and the other is a consumer. The producer send the same message over and over and the consumer reads the messages from the topic and count how many messages it has read. The main program stops after 30 minutes but before it stops it prints how many messages were consumed and how many messages were consumed per second.

Read on for the results.  More importantly, test in your own environment with your own equipment, as that value’s likely to differ a bit.

Comments closed

Checking Plan Compilation Time

Eric Blinn looks at plan compilation time in SQL Server:

The query returns 4 rows. By including STATISTICS TIME we get extra information on the Messages output tab.  We can see from the execution on my laptop that the optimizer took 6ms to compile a query plan and the actual query executed in only 1ms.

Run the query batch a few more times and notice that the parse and compile time drops to zero.  This is because SQL Server keeps a list of compiled plans and tries to reuse them without having to recompile.  In this case the optimizer has recognized that this query is exactly identical to one it has previously executed and it reuses the previously compiled plan.  That list of plans is called the Plan Cache and will be covered in much more detail in a subsequent post.

This cost is something we tend to forget about, but can make a big difference for a user’s experience.

Comments closed