So yeah, that function seems to get up to something once for every ID you pass in. Remember that in our STUFF… query, we grabbed the TOP 10 each time. In the XE session, each time we call the proc, the string splitting function compiles and executes code 10 times. Bummerino. That’s the life of a loop.
On SQL Server 2016 (and really, with any non-looping code), we can get around the constant compilations with a simple rewrite. In this case, I’m calling 2016’s STRING_SPLIT function instead of the MSTVF function.
Read on for more.
If we remember, for the CE 120 it was a one row estimate, and in this case server decided, that it is cheaper to use a non-clustered index and then make a lookup into clustered. Not very effective if we remember that our predicate returns all rows.
In CE 130 there was a 365 rows estimate, which is too expensive for key lookup and server decided to make a clustered index scan.
But, wait, what we see is that in the second plan the estimate is also 1 row!
That fact seemed to me very curious and that’s why I’m writing this post. To find the answer, let’s look in more deep details at how the optimization process goes.
This was an interesting look at how the optimizer looks at scalar user-defined functions.
There is a DMV that isn’t used a lot of the time because the information within it frequently doesn’t have a lot of bearing on solving fundamental query tuning issues such as out of date statistics, bad or missing indexes, or poorly structured T-SQL. This DMV, sys.dm_exec_plan_attributes, contains a bunch of values that are used by the optimizer to identify a plan in cache, such as object_id (if any), database_id, and compatibility level (compat_level). In addition to these clear & easy to understand attributes, there’s one more, set_options, that’s not immediately clear.
Read on for more information and a sample call.
I opened it up, and sure enough, no sign of that 7,276 value. It looks just the same as the estimated plan I just showed.
Getting plans out of the cache is where the estimated values come into their own. It’s not just that I’d prefer to not actually run potentially-expensive queries on customer databases. Querying the plan cache is one thing, but running queries to get the actuals – that’s a lot harder.
With SQL 2016 SP1 installed, thanks to that Connect item, I can now see the Estimated Number of Rows to be Read property in estimated plans, and in the plan cache. The operator tooltip shown here is taken from the cache, and I can easily see that Estimated property showing 7,276. This is shown from Management Studio because Plan Explorer doesn’t yet call out this property explicitly:
If you’re looking to use SQL Server 2016 SP1, read the whole thing; this will make query tuning without running those horribly expensive queries a bit easier.
I would like to see a hint that causes the optimiser to consider a parallel plan no matter the cost of the query. It’s possible to get this behaviour with trace flag 8649 but it’s unsupported by Microsoft and therefore unfit for production use.
I only tend to use query hints as a very last resort. It’s almost always better to allow the optimiser to make these decisions and continue to reevaluate these decisions as your data changes but sometimes they can be a get out of jail free card.
Click through for the full argument, and then hit the Connect item if you agree.
On my VM with 4 cores it takes 33 seconds to execute this query on SQL Server 2016 with Service Pack 1, while it burns almost 48 seconds of the CPU Time.
The relevant part of the execution plan can be found below, showing so many performance problems that this query is suffering, such as INNER LOOP JOIN, INDEX SPOOL, besides even worse part that is actually hidden and is identifiable only once you open the properties of any of the lower tree (left side of the LOOP JOIN), seeing that it all runs with the Row Execution Mode actually.
To show you the problem, on the left side you will find the properties of the sort iterator that is to be found in the lower (left) part of the LOOP Join that was executed around 770.000 times in the Row Execution Mode, effectively taking any chances away from this query to be executed in a fast way. One might argue that it might that it might be more effective to do the loop part in Row Mode, but given that we are sorting around 3.1 Million Rows there – for me there is no doubt that it would be faster to do it within a Batch Execution Mode. Consulting the last sort iterator in the execution plan (TOP N SORT), you will find that it is running with the help of the Batch Execution Mode, even though it is processing around 770.000 rows.
There’s some valuable information here.
As you might guess, things may not always get evenly accessed, even if you have evenly sized tempdb files. One of my queries did a select into a temp table. Although it used all four tempdb files whether or not it went parallel, there were more file_read events against the first tempdb file than against the other four.
It’s an interesting look at this specific question, but also as pedagogical technique.
Looking at the execution plan, SQL Server decided to scan the non-clustered columnstore index, even though it doesn’t contain the computed column BirthYear! This surprised me, because I have a plain old non-clustered index on BirthYear which covers the query as well. I guess the optimizer is really excited about that nonclustered columnstore.
Kendra links to a Connect item from Niko Neugebauer to add persisted computed columns to columnstore indexes.
While presenting recently and talking about dealing with bad Parameter Sniffing, I got the question; what happens to OPTIMIZE FOR hints when parameter sniffing is disabled?
This is my favorite kind of question because the answer is simple: I don’t know.
Click through to join Grant in learning what happens.