Press "Enter" to skip to content

Category: Query Tuning

JDFI

Sometimes, Michael J. Swart says, it’s better to just do it:

Okay, this is getting out of hand. The query shouldn’t have to be this complicated.
Luckily I work with a guy named Chris. He’s amazing at what he does. He questions everything without being a nitpicker (there’s a difference). He read through the Mythbusters post and followed all the links in the comments. He asked whether gbn’s JFDI pattern wasn’t better here. So I implemented it just to see what that looked like:

I’ve ended up doing the same thing in a similar scenario.  But as Aaron Bertrand notes in the comments, test your results because performance could end up being even worse than before.

Comments closed

CROSS APPLY

Steve Jones compares writing a function versus using CROSS APPLY to write the same function:

The conclusion I’d take here is that CROSS APPLY ought to be a tool you keep in the front of your toolbox and use when you must execute a function for each row of a set of tables. This is one of the T-SQL  techniques that I never learned early in my career (it wasn’t available), and I haven’t used much outside of looking for execution plans, but it’s a join capability I will certainly look to use in the future.

I’m one of the biggest fans of the APPLY operator out there—my favorite talk is based on it, even.  But in this case, I’m going to say that writing “CROSS APPLY” really didn’t do anything here—times are similar enough that I’d be suspicious that the database engine is doing the same thing both times.

Comments closed

Query Tricks Which Don’t Trick

Gail Shaw has a follow-up with more query “tricks” that aren’t fooling anyone:

In a similar vein to last week’s blog post… I heard an interesting comment recently. “Change that Column != 2 to a Column > 2 or Column < 2 combination, it can use indexes better.”

Sounds like something that clearly needs testing!

Not shockingly, this did nothing to make the query run faster or use fewer resources.  There are ways to rewrite queries to improve performance while maintaining the same result structure (a common example being rewriting query using a cursor or WHILE loop to perform one set-based operation), but Gail’s point is vital:  test your changes and make sure that if you’re saying it will perform better, that it actually perform better.

Comments closed

Cardinality Estimation And String Splits

Dan Holmes points out a quirk of estimated row counts with CLR-based functions:

That is an enormous amount of data.  What if you needed to sort that?  What if you joined this to another table or view and a spool was required.  What it it was a hash join and a memory grant was required?  The demand that this seemingly innocuous statement placed on your server could be overwhelming.

The memory grant could create system variability that is very difficult to find.  There is a thread on MSDN that I started which exposes what prompted this post.  (The plan that was causing much of the problem is at this link.)

It’s important to keep in mind the good enough “big round figures” that SQL Server uses for row estimation when stats are unavailable (e.g., linked server to Hive or a CLR function like in the post).  These estimates aren’t always correct, and there are edge cases like the one in the post in which the estimates will be radically wrong and begin to affect your server.

Comments closed

Cardinality Estimation And Hints

Aaron Bertrand uses a Visual Studio Online outage to talk about query hints:

These are all things that may have been necessary under the old estimator, but are likely just tying the optimizer’s hands under the new one. This is a query that could have, and should have, been tested in their dev / staging / QA environments under the new cardinality estimator long before they flipped the switch in production, and probably could have gone through series of tests where different combinations of those hints and options could have been removed. This is something for which that team can only blame themselves.

Also check out Aaron Morelli’s comment on the post.

Comments closed

Plan Cache Spelunking

Ed Pollack digs into the plan cache:

The data in the plan cache is not static, and will change over time. Execution plans, along with their associated query and resource metrics will remain in memory for as long as they are deemed relevant. Plans can be removed from cache when there is memory pressure, when they age out (they get stale), or when a new plan is created, rendering the old one obsolete. Keep this in mind as we are looking around: The info we find in the plan cache is transient and indicative of a server’s current and recent activity and does not reflect a long-term history. As a result, be sure to do thorough research on the plan cache prior to making any significant decisions based on that data.

The plan cache is one of the best ways of figuring out what’s going on in your SQL Server instances, but there’s a little bit of complexity to it.

Comments closed

Sorting Power Pivot Data On Load

Matt Allington suggests pre-sorting results to reduce load in Power Pivot:

Imagine you have 50,000 products in your data table and you have 50,000,000 rows of data.  Power Pivot will take the first 1 million rows it comes to (1 segment worth), work out how to sort and compress the columns, and then compress the data into a single segment before moving to the next 1 million rows it comes to (in the order they are loaded).  When it does this, it is highly likely that every product number will appear in every single segment – all 50 segments.  If we assume an equal number of product records for each product (unlikely but OK for this discussion), then there would be 1,000 records for each product spread throughout the entire data table,and each and every segment is likely to contain all 50,000 product IDs.  This is not good for compression.

This is an interesting result and not something I would have thought intuitive.

Comments closed

Beware ROWLOCK Hints

Kendra Little points out that ROWLOCK hints might make blocking worse:

Note that the logical reads are the exact same and neither query is doing physical reads (the execution plans are the same– the optimizer doesn’t care what locks you are using). The queries were run with SET STATISTICS IO,TIME OFF and Execution Plans turned off, just to reduce influencing factors on duration and CPU.

The database engine is simply having to do more work here. Locking the pages in the clustered index is less work than locking each of the 1,825,433 rows.

Even though our locks are more granular, making queries run longer by taking out individual locks will typically lead to more blocking down the road.

Kendra follows up with several optimization possibilities, so read the whole thing.

Comments closed