Using Powershell To Shred Query Plan XML

Mike Fal shows how to use Powershell (or any .NET language) to read parts of a query plan:

Once the pattern is down, the use is pretty straightforward. There’s also more options accessible to you. If we just look at the RunTimeCountersPerThread node, we can compare other values such as Rows, Scans, and CPU time. We could really get crazy and extract all the different statements within the batch. There are numerous possibilities for analysis and review.

I’m not here to tell you that you should start using PowerShell to automate query tuning. Query performance is an art form and requires a lot of case-by-case analysis. However, like any great carpenter, it’s good to know the capabilities of your tool set. Understanding the options available to you not only helps you be more effective, but can also provide answers you may not have had access to.

It’s another tool for the belt.

UDFs And Recompilations

Erik Darling shows how to diagnose a high recompilation problem:

So yeah, that function seems to get up to something once for every ID you pass in. Remember that in our STUFF… query, we grabbed the TOP 10 each time. In the XE session, each time we call the proc, the string splitting function compiles and executes code 10 times. Bummerino. That’s the life of a loop.

On SQL Server 2016 (and really, with any non-looping code), we can get around the constant compilations with a simple rewrite. In this case, I’m calling 2016’s STRING_SPLIT function instead of the MSTVF function.

Read on for more.

Project Normalization In UDFs

Dmitry Pilugin looks into how the optimizer (using the 2014-and-on cardinality estimator) processes user-defined functions:

If we remember, for the CE 120 it was a one row estimate, and in this case server decided, that it is cheaper to use a non-clustered index and then make a lookup into clustered. Not very effective if we remember that our predicate returns all rows.

In CE 130 there was a 365 rows estimate, which is too expensive for key lookup and server decided to make a clustered index scan.

But, wait, what we see is that in the second plan the estimate is also 1 row!

That fact seemed to me very curious and that’s why I’m writing this post. To find the answer, let’s look in more deep details at how the optimization process goes.

This was an interesting look at how the optimizer looks at scalar user-defined functions.

Query Plan Attributes

Grant Fritchey explains the sys.dm_exec_plan_attributes DMV:

There is a DMV that isn’t used a lot of the time because the information within it frequently doesn’t have a lot of bearing on solving fundamental query tuning issues such as out of date statistics, bad or missing indexes, or poorly structured T-SQL. This DMV, sys.dm_exec_plan_attributes, contains a bunch of values that are used by the optimizer to identify a plan in cache, such as object_id (if any), database_id, and compatibility level (compat_level). In addition to these clear & easy to understand attributes, there’s one more, set_options, that’s not immediately clear.

Read on for more information and a sample call.

Estimated Number Of Rows To Be Read

Rob Farley discusses a successful Connect item’s implementation:

I opened it up, and sure enough, no sign of that 7,276 value. It looks just the same as the estimated plan I just showed.

Getting plans out of the cache is where the estimated values come into their own. It’s not just that I’d prefer to not actually run potentially-expensive queries on customer databases. Querying the plan cache is one thing, but running queries to get the actuals – that’s a lot harder.

With SQL 2016 SP1 installed, thanks to that Connect item, I can now see the Estimated Number of Rows to be Read property in estimated plans, and in the plan cache. The operator tooltip shown here is taken from the cache, and I can easily see that Estimated property showing 7,276. This is shown from Management Studio because Plan Explorer doesn’t yet call out this property explicitly:

If you’re looking to use SQL Server 2016 SP1, read the whole thing; this will make query tuning without running those horribly expensive queries a bit easier.

Wanted: Forced Parallelism

James Anderson would like a query hint to force parallelism:

I would like to see a hint that causes the optimiser to consider a parallel plan no matter the cost of the query. It’s possible to get this behaviour with trace flag 8649 but it’s unsupported by Microsoft and therefore unfit for production use.

I only tend to use query hints as a very last resort. It’s almost always better to allow the optimiser to make these decisions and continue to reevaluate these decisions as your data changes but sometimes they can be a get out of jail free card.

I’ve written about this before and so has Paul White. Paul created this connect item for something similar. Please vote it up if you think it could help with your query tuning.

Click through for the full argument, and then hit the Connect item if you agree.

Columnstore Query Patterns

Niko Neugebauer gives a couple hints on how to make columnstore queries as fast as possible:

On my VM with 4 cores it takes 33 seconds to execute this query on SQL Server 2016 with Service Pack 1, while it burns almost 48 seconds of the CPU Time.
The relevant part of the execution plan can be found below, showing so many performance problems that this query is suffering, such as INNER LOOP JOIN, INDEX SPOOL, besides even worse part that is actually hidden and is identifiable only once you open the properties of any of the lower tree (left side of the LOOP JOIN), seeing that it all runs with the Row Execution Mode actually.

To show you the problem, on the left side you will find the properties of the sort iterator that is to be found in the lower (left) part of the LOOP Join that was executed around 770.000 times in the Row Execution Mode, effectively taking any chances away from this query to be executed in a fast way. One might argue that it might that it might be more effective to do the loop part in Row Mode, but given that we are sorting around 3.1 Million Rows there – for me there is no doubt that it would be faster to do it within a Batch Execution Mode. Consulting the last sort iterator in the execution plan (TOP N SORT), you will find that it is running with the help of the Batch Execution Mode, even though it is processing around 770.000 rows.

There’s some valuable information here.

TempDB And Parallelism

Kendra Little looks at cases when a query uses multiple tempdb data files:

As you might guess, things may not always get evenly accessed, even if you have evenly sized tempdb files. One of my queries did a select into a temp table. Although it used all four tempdb files whether or not it went parallel, there were more file_read events against the first tempdb file than against the other four.

It’s an interesting look at this specific question, but also as pedagogical technique.

Computed Columns And Columnstore

Kendra Little exposes a gotcha with non-clustered columnstore indexes and computed columns:

Looking at the execution plan, SQL Server decided to scan the non-clustered columnstore index, even though it doesn’t contain the computed column BirthYear! This surprised me, because I have a plain old non-clustered index on BirthYear which covers the query as well. I guess the optimizer is really excited about that nonclustered columnstore.

Kendra links to a Connect item from Niko Neugebauer to add persisted computed columns to columnstore indexes.

OPTIMIZE FOR Without Parameter Sniffing

Grant Fritchey digs into an interesting combination of settings:

While presenting recently and talking about dealing with bad Parameter Sniffing, I got the question; what happens to OPTIMIZE FOR hints when parameter sniffing is disabled?

This is my favorite kind of question because the answer is simple: I don’t know.

Click through to join Grant in learning what happens.

Categories

May 2017
MTWTFSS
« Apr  
1234567
891011121314
15161718192021
22232425262728
293031