Query Tuning – Page 53

Avoiding OR Clauses in Joins

Published 2019-06-05 by Kevin Feasel

Erik Darling wants you to embrace the power of AND:

I had to write some hand-off training about query tuning when I was starting a new job.
As part of the training, I had to explain why writing “complicated logic” could lead to poor plan choices.
So I did what anyone would do: I found a picture of a pirate, named him Captain Or, and told the story of how he got Oared to death for giving confusing ORders.

Click through for a troublesome query and a few ways of rewriting it to be less troublesome. My goto is typically to rewrite as two statements with a UNION ALL between them if I can.

Comments closed

Stats IO Oddities

Published 2019-06-03 by Kevin Feasel

Josh Darnell collects a few cases where SET STATISTICS IO ON doesn’t behave quite as you might expect:

The first one comes from a post on Database Administrators Stack Exchange: STATISTICS IO for parallel index scan
To summarize the situation, the OP had a query that was scanning a clustered index. They were seeing significantly higher numbers reported in the logical reads portion of the STATISTICS IO output when the query ran in parallel vs. serially (with a MAXDOP 1 query hint). There is a demo of this behavior in the post, so I won’t reproduce it here.

There are several interesting cases in here, so check them out.

Comments closed

Methods for Rewriting SQL Queries

Published 2019-05-29 by Kevin Feasel

Bert Wagner puts together a list of 12 techniques for tuning SQL queries:

6. DISTINCT with few unique values
Using the DISTINCT operator is not always the fastest way to return the unique values in a dataset. In particular, Paul White uses recursive CTEs to return distinct values on large datasets with relatively few unique values. This is a great example of solving a problem using a very creative solution.

Click through for the full list as well as a video demonstration.

Comments closed

Self-Joins Versus Key Lookups

Published 2019-05-28 by Kevin Feasel

Erik Darling takes us through an interesting scenario:

Like most tricks, this has a specific use case, but can be quite effective when you spot it.
I’m going to assume you have a vague understanding of parameter sniffing with stored procedures going into this. If you don’t, the post may not make a lot of sense.
Or, heck, maybe it’ll give you a vague understanding of parameter sniffing in stored procedures.

This was new to me.

Comments closed

Choosing Between Merge Join and Hash Join

Published 2019-05-22 by Kevin Feasel

Erik Darling gives us a Sophie’s Choice:

It could have chosen a Hash Join, but then the order of the Id column from the Posts table wouldn’t have been preserved on the other side.
Merge Joins are order preserving, Hash Joins aren’t. If we use a Hash Join, we’re looking at ordering the results of the join after it’s done.
But why?

Read on to learn why, as well as why a few other things are so.

Comments closed

Creating Temp Staging Tables to Avoid Spooling

Published 2019-05-08 by Kevin Feasel

Bert Wagner shows how you can create your own tables in tempdb to avoid eager or lazy spools:

SQL Server Spool operators are a mixed bag. On one hand, they can negatively impact performance when writing data to disk in tempdb. On the other hand, they allow filtered and transformed result sets to be temporarily staged, making it easier for that data to be reused again during that query execution.
The problem with the latter scenario is that SQL Server doesn’t always decide to use a spool; often it’s happy to re-read (and re-process) the same data repeatedly. When this happens, one option you have is to explicitly create your own temporary staging table that will help SQL Server cache data it needs to reuse.

The other problem with spooling is that the spool doesn’t have indexes and so performance can be awful. When I look at an execution plan, one of my immediate red flags is spooling: if we have that, removing it is one of the first candidates for optimization after the trivial stuff (expected scan/seek behavior, “fat pipes” from excessive row counts, residual I/O, etc.).

Comments closed

Rewriting Expensive Updates

Published 2019-05-07 by Kevin Feasel

Erik Darling takes us through an experiment:

Let’s also say that bad query is taking part in a modification.
UPDATE u2 SET u2.Reputation *= 2 FROM Users AS u JOIN dbo.Users AS u2 ON CHARINDEX(u.DisplayName, u2.DisplayName) > 0 WHERE u2.Reputation >= 100000; AND u.Id <> u2.Id;
This query will run for so long that we’ll get sick of waiting for it. It’s really holding up writing this blog post.

Erik rewrites this query a couple of times. Click through to learn what he does and why he does it.

Comments closed

Minimal Logging When Inserting into Heaps

Published 2019-05-02 by Kevin Feasel

Paul White gives us the lowdown on minimal logging when performing INSERT..SELECT operations into heap tables:

When inserting rows using INSERT...SELECT into a heap with no nonclustered indexes, the documentation universally states that such inserts will be minimally logged as long as a TABLOCK hint is present. This is reflected in the summary tables included in the Data Loading Performance Guide and the Tiger Team post. The summary rows for heap tables without indexes are the same in both documents (no changes for SQL Server 2016):

But it’s not quite that straightforward, as Paul shows. Read the whole thing.

Comments closed

Breaking Up Queries with UNION ALL

Published 2019-05-01 by Kevin Feasel

Bert Wagner takes us through a scenario where it can be faster to combine queries with UNION ALL rather than using IN:

Even though this query reads the whole clustered index to get the Benefactor rows, the total number of logical reads is still smaller than the seek/key lookup pattern seen in the combined query with IN(). This UNION ALL version gives SQL Server the ability to build a hybrid execution plan, combining two different techniques to generate a plan with fewer overall reads.

Click through for the example.

Comments closed

Linked Servers and Remote Insertion

Published 2019-04-29 by Kevin Feasel

Max Vernon recommends a pull rather than push model when you need to insert cross-server:

Linked Servers offer a great way to connect two SQL Servers together, allowing remote querying and DML operations. Frequently, this is used to copy data from production to reporting. However, the temptation is to run the copy operation on the production, or source side. If you do that, even with a single INSERT INTO statement, SQL Server will process each individual row as a discrete INSERT INTO statement via a cursor operation. This makes for very slow inserts across a linked server. Running the operation from the destination server means SQL Server can simply query the remote source table for all the rows, inserting them as a set into the destination table. The difference in speed can be eye-watering.

Click through for a slightly creepy picture and a less creepy example.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Category: Query Tuning