Category: Query Tuning

Nested Views

Published 2016-07-07 by Kevin Feasel

Ginger Grant explains several problems with nested views:

In addition to the performance reasons, there are other reasons not to use nested views. Supporting and maintaining a nested views can be a nightmare. If there is an issue with the accuracy of the data, finding the problem is just that much harder. And what about when you go to fix the problem? Then you need to test all of the places the view is called. I worked with some code recently where a view was created on a set of data just to create 3 new values based up 3 case statements. That view was nested many levels below where the field was actually called. I know people from a development background look at nested views as modular development. Don’t. TSQL is not the same as object oriented code.

Read the whole thing. She also has helpful links digging further into the topic.

Comments closed

Comparing Query Plans

Published 2016-06-17 by Kevin Feasel

Kenneth Fisher wants to compare two query plans using SQL Server 2016:

One of the new/not so new features that I’ve been playing with off and on is the ability to compare two query plans. I’m still relatively new at performance tuning and I frequently like to compare the query plans from before and after any changes I make. Historically I’d bring up both plans next to each other and walk through them section by section. You can see how this method would be a bit of a pain, particularly for large plans or plans that have changed quite a bit. But SSMS 2016 can now help us out.

Lets say for example I’m working on the view [AdventureWorks2014].[HumanResources].[vEmployee]. I decide that the OPTION FAST will speed the overall performance up.

If you can’t get SQL Sentry Plan Explorer installed on your machine for whatever reason, this is a viable alternative.

Comments closed

The Secret Lives Of Seeks

Published 2016-06-17 by Kevin Feasel

Rob Farley digs into what happens with a seek operation:

Let’s go back to our original query, looking for address types 2, 4, and 5, (which returns 2 rows) and think about what’s going on inside the seek.

I’m going to assume the Query Engine has already done the work to figure out that the Index Seek is the right operation, and that it has the page number of the index root handy.

At this point, it loads that page into memory, if it’s not already there. That’s the first read that gets counted in the execution of the seek. Then it locates the page number for the row it’s looking for, and reads that page in. That’s the second read.

But we often gloss over that ‘locates the page number’ bit.

The upshot is rather interesting: in certain edge cases, an uglier query can be better than an easier-to-understand query. If you do this, however, you definitely want to document it; otherwise, you’ll leave the next maintainer (which could be you!) confused.

Comments closed

MAX_GRANT_PERCENT

Published 2016-06-13 by Kevin Feasel

Jack Li gives an example in which MAX_GRANT_PERCENT can keep certain queries from getting runaway memory grants:

The customer has lots of waits on RESOURCE_SEMAPHORE_QUERY_COMPILE. To troubleshoot this, we have to look from two angles. First, did customer have many queries needing large amount of compile memory? Secondly, was it possible that other components used too much memory, causing the threshold lowered? In other words, if SQL Server had enough memory, those queries requiring same amount of compile memory would not have been put to wait.

We used this query and captured for several iterations of data to confirm that server didn’t have queries that required large amount of compile memory per se.

It’s nice to have this trick up your sleeve when you simply can’t get a better query in place.

Comments closed

Actual Rows Read

Published 2016-06-10 by Kevin Feasel

Rob Farley goes into detail on the Actual Rows Read property in execution plans:

The warning says “Operation caused residual IO. The actual number of rows read was 2,130, but the number of rows returned was 2.” Sure enough, further up we see “Actual Rows Read” saying 2,130, and Actual Rows at 2.

Whoa! To find those rows, we had to look through 2,130?

You see, the way that the Seek runs is to start by thinking about the Seek Predicate. That’s the one that leverages the index nicely, and which actually causes the operation to be a Seek. Without a Seek Predicate, the operation becomes a Scan. Now, if this Seek Predicate is guaranteed to be at most one row (such as when it has an equality operator on a unique index), then we have a Singleton seek. Otherwise, we have a Range Scan, and this range can have a Prefix, a Start, and an End (but not necessarily both a Start and an End). This defines the rows in the table that we’re interested in for the Seek.

But ‘interested in’ doesn’t necessarily mean ‘returned’, because we might have more work to do. That work is described in the other Predicate, which is often known as the Residual Predicate.

Definitely worth a read.

Comments closed

New Diagnostic Queries

Published 2016-06-09 by Kevin Feasel

Glenn Berry has a new update of his diagnostic queries, focused around SQL Server 2014 and 2016:

This month, I have several improvements in the SQL Server 2014 and 2016 sets, along with additional comments and documentation in the SQL Server 2012, 2014 and 2016 sets. One new feature is a series of six dashes after each query to make them easier to parse by people who use PowerShell to run the queries and collect the results. I have gotten quite a bit of interest about making a special version of these queries for SQL Database in Microsoft Azure. So, I will be doing that pretty soon.

If you don’t already use Glenn’s queries, I highly recommend them as a starting point in tracking down issues.

Comments closed

Key Lookup Without Output Column

Published 2016-06-03 by Kevin Feasel

Daniel Hutmacher gives us a head-scratcher:

Performance tuning the other day, I was stumped by a query plan I was looking at. Even though I had constructed a covering index, I was still getting a Key Lookup operator in my query plan. What I usually do when that happens is to check the operator’s properties to see what its output columns are, so I can include those columns in my covering index.

Here’s the interesting thing: there weren’t any output columns. What happened?

The answer makes perfect sense, and shows that looking at the SELECT clause isn’t enough.

Comments closed

Kitchen Sink Procedures

Published 2016-06-03 by Kevin Feasel

Aaron Bertrand discusses procedures which satisfy all potential search conditions:

This table has a lot of columns, obviously. Now envision an interface where employees at World Wide Importers could perform searches for specific customers. They may want to perform different searches based on unique combinations of criteria, including (and this is not an exhaustive list by any means, but enough for my purposes today):

CustomerID (e.g. pick the customer from a list);

actual “bill to” CustomerID;

name;

category;

buying group;

when the account was opened;

city; or,

whether the account is on credit hold.

And any combination of the above. I know you’ve seen and probably built interfaces like this before, but just as a quick mockup, the employees would have a screen something like this:

There are plenty of ways to solve this problem, and Aaron shows a couple methods (including one which has major problems).

Comments closed

Spark Optimizations

Published 2016-05-25 by Kevin Feasel

Over at the DZone blog, we learn how to use Distribute By and Cluster By to optimize Spark performance:

Your DataFrame is skewed if most of its rows are located on a small number of partitions, while the majority of the partitions remain empty. You really should avoid such a situation. Why? This makes your application virtually not parallel – most of the time you will be waiting for a single task to finish. Even worse, in some cases you can run out of memory on some executors or cause an excessive spill of data to a disk. All of this can happen if your data is not evenly distributed.

To deal with the skew, you can repartition your data using distribute by. For the expression to partition by, choose something that you know will evenly distribute the data. You can even use the primary key of the DataFrame!

It’s interesting to see how cluster by, distribute by, and sort by can have such different performance consequences.

Comments closed

New XQueryPlanPath Update

Published 2016-05-24 by Kevin Feasel

Richie Lee has an update of XQueryPlanPath:

I’ve made a few changes to the XQueryPlanPath project. The project parses query plans into xml and then using xpath to find the value of one or more nodes. This could then be used in testing to verify that any changes made to a query would retain a query plan that is considered optimal, and then if any changes break the test you can verify if the change causes sub-optimal effect on your query.

There was however one issue – query plans are like opinions; every SQL Server Instance has one, and none of them think that theirs stinks. So running a test on a dev box will potentially produce a different query plan from that on the build server, to that of production etc. This broadly because of 3 reasons:

Check it out, especially if your XML parsing skills aren’t top-notch.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31