Query Tuning – Page 42

Reading Query Plans in Spark

Published 2020-05-01 by Kevin Feasel

Daniel Ciocirlan has a primer on query plans in Apache Spark:

Let’s go over some examples of query plans and how to read them. Let’s go back to the one we’ve just shown:
== Physical Plan == *(1) Project [(id#0L * 5) AS id#2L]
+- *(1) Range (1, 1000000, step=1, splits=6)
We read this plan backwards, bottom to top:

Spark does have some UI components which make this a bit easier, but you’ll probably end up in a situation where you need to read it in this format.

Comments closed

I Remember Halloween

Published 2020-05-01 by Kevin Feasel

Paul White talks about the Halloween Problem:

Much has been written over the years about understanding and optimizing SELECT queries, but rather less about data modification. This series looks at an issue that is specific to INSERT, UPDATE, DELETE and MERGE queries – the Halloween Problem.
The phrase “Halloween Problem” was originally coined with reference to a SQL UPDATE query that was supposed to give a 10% raise to every employee who earned less than $25,000. The problem was that the query kept giving 10% raises until everyone earned at least $25,000.
We will see later on in this series that the underlying issue also applies to INSERT, DELETE and MERGE queries, but for this first entry, it will be helpful to examine the UPDATE problem in a bit of detail.

This is a classic problem in data management and has led to a good bit of confusion over time about why database updates can perform worse than you’d expect.

Comments closed

When a Non-Clustered Index on Clustered Columns Makes Sense

Published 2020-04-30 by Kevin Feasel

Allen White gives us a scenario where adding a non-clustered index which is the same column as the clustered index can make sense:

Recently I was asked about adding a non-clustered index to a table (let’s call it Images) with just one column. It had been added in the development database and it improved performance dramatically. I looked at it and it had the same key as the clustered index on that table.
In reviewing the query I saw that Images was joined to the other tables in the query, but none of the columns were used, so Images was joined to ensure that values from the other tables had rows in Images. The query plan shows a significantly higher number of reads against Images without the new NCI (non-clustered index) than when it’s present.

I do agree that this can help—as we obviously see. The backseat query tuner in me wonders if maybe there’s another way to write the query to prevent the scan by using CROSS APPLY, but that’d only help if they were getting a small percentage of rows from the parent table expression built from the combination of the clustered index scan and index seek in the second example.

Comments closed

The Peril of Local Variables

Published 2020-04-01 by Kevin Feasel

Erik Darling dives into the tradeoffs you make when using local variables in stored procedures to avoid parameter sniffing:

In a stored procedure (and even in ad hoc queries or within dynamic SQL, like in the examples linked above), if you declare a variable within that code block and use it as a predicate later, you will get either a fixed guess for cardinality, or a less-confidence-inspiring guess than when the histogram is used.
The local variable effect discussed in the rest of this post produces the same behavior as the OPTIMIZE FOR UNKNOWN hint, or executing queries with sp_prepare. I have that emphasized here because I don’t want to keep qualifying it throughout the post.

This deserves a careful read-through.

Comments closed

Figuring out How a Plan was Forced

Published 2020-03-25 by Kevin Feasel

Erin Stellato wants to know whether a DBA forced a plan or SQL Server did automatically:

If you use Automatic Plan Correction, and thus also Query Store, you may wonder how was a plan forced: manually or automatically with APC? The type of forced plan is tracked in sys.query_store_plan, and you can use this simple query to determine how a plan was forced:

Click through for a simple query, as well as a more complex form which gives you a bit more info.

Comments closed

Improving Queries at the Margin

Published 2020-03-24 by Kevin Feasel

Jared Poche has a story about improving a query which is already pretty fast:

In my last post, I spoke about optimizing a procedure that was being executed hundreds of millions of times per day, and yes, that is expected behavior.
The difficult thing about trying to optimize this procedure is that it only takes 2.5ms on average to run. Tuning this isn’t a matter of changing a scan to a seek; we’ll have to look hard to find the opportunities here. A one millisecond Improvement on a procedure running 100 million times a day would save 100,000 seconds every day.
Well, I’ve found a few more options since my last post, and wanted to share my findings.

Read on to see how Jared tries to tackle one specific case.

Comments closed

Tuning a Query Searching for a Substring in Text

Published 2020-03-23 by Kevin Feasel

Eddy Djaja gives us two methods for improving performance of a search for a fixed substring:

The reason substring function is used because the column ACCOUNTDISPLYVALUE has multiple values combined in one column. In this case, the query is searching for the Account Number which is the first six characters. The long running query is listed below:
set statistics io on go select sum(ACCOUNTINGCURRENCYAMOUNT)from [d365].[GeneralJournalAccountMultiCompanyEntries]where substring([ACCOUNTDISPLAYVALUE], 1, 6) = '877601'

Eddy gives us two solutions. As a quick note, these solutions work because the query is looking for a specific stretch of characters after a specific starting point. For arbitrary text, things get a little trickier.

Comments closed

Interpreting a Key Lookup

Published 2020-03-19 by Kevin Feasel

Erik Darling dives into a weird key lookup:

This lookup doesn’t produce any rows or columns, that’s why there are 0.0 rows per iteration.
It’s purely to filter data out, and it does that. Slowly.
Look, I’m not defending the choice, I’m just using it to teach you something.

Read on for your moment of understanding.

Comments closed

Quickly Finding Distinct Values with T-SQL

Published 2020-03-17 by Kevin Feasel

Paul White does some amazing things with T-SQL, news at 11:

I will be looking for distinct values in the BountyAmount column of the dbo.Votes table, presented in bounty amount order ascending. The Votes table has just under 53 million rows (52,928,720 to be exact). There are just 19 different bounty amounts, including NULL.
The Stack Overflow 2013 database comes without nonclustered indexes to minimize download time. There is a clustered primary key index on the Id column of the dbo.Votes table. It comes set to SQL Server 2008 compatibility (level 100), but we will start with a more modern setting of SQL Server 2017 (level 140):

Getting the query down from 10.5 seconds to 1ms is crazy.

Comments closed

Tracking Performance of Queries which use RECOMPILE Hints

Published 2020-03-13 by Kevin Feasel

Brent Ozar has some tips if you use RECOMPILE hints frequently:

The first query’s plan stuck around in memory, so it now shows 2 executions, and 2 total rows returned. Its row metrics are correct through the life of the stored procedure’s time in cache.
However, the second query – the one with the recompile hint – has a brand new plan in the cache, but also new metrics. You’re not just recompiling the execution plan, but you’re also not getting query plan metrics here. (That’s fine, and that part I was also kinda aware of.)
But the part that I keep forgetting is that when I’m looking at the stored procedure’s totals in sp_BlitzCache, the total, min, and max values are useless:

If the plan cache isn’t going to help, what will? Brent tells you exactly what.

Comments closed

Category: Query Tuning