Think Set-Based

Paul Randal explains why you should think in a set-based manner when writing T-SQL queries:

This explanation involves a similar problem but not involving SQL Server. Imagine you need to acquire twelve 4′ x 8′ plywood sheets from your local home improvement store.

You could drive to and from the store twelve times, and each time you need to go into the store, purchase the sheet, and wait for a staff member to become available to load the sheet into your pickup truck, then drive home and unload the sheet.

Or you could drive to the store once and purchase all twelve sheets in one go, with maybe four staff members making three trips each out to your pickup, carrying one sheet each time. Or even just one staff member making twelve trips out to your pickup.

Which method is more efficient? Multiple trips to the store or one trip to the store, no matter how many staff members are available to carry the sheets out?

On the SQL Server side, the APPLY operator is a method for bridging set-based operations with procedural thought patterns.  It’s not the perfect answer for everything, but there are some fantastic use cases (like simplifying calculations via chained APPLY operators).


Shane O’Neill explains what SARGable means and why it’s important:

So now that 1). we have our table and b). we have an index we can use, we can run the developer’s query and be SARGable right?

DECLARE @Year INT = 2016;
SELECT [Test_Date] FROM [dbo].[DateTest] WHERE YEAR([Test_Date]) = @Year;

Nope! Table scan, ignores our Index and reads all 15M (too lazy for all the zeros) for a measely 127,782 rows! It’s not the slowest, taking around 3.960 seconds but still, we want SARGable!!!

Watch for the surprise twist at the end.

Transforming Cursors

Mickey Stuewe has a post in which she transforms a cursor into a set-based procedure:

His approach was to use a cursor to cycle through all the columns in the provided table, analyze each column, determine the new data type, and store the information in a table variable. After the cursor was completed, the data in the table variable was written to a permanent table for the next process to use.

This approach isn’t necessarily bad. If you are only running it infrequently and you needed to write this stored procedure quickly, then it’s fine. But if this type of stored procedure needs to be run frequently, then it should be rewritten.

Set-based code tends to be easier to read and more compact than cursors, so even without the performance improvements they bring, there are benefits.

Fast Queries Can Hurt Performance

Andy Mallon diagnoses a fast query which was hurting performance:

This code is coming from an ORM, which is parameterizing the filters, but not the (unnecessary & arbitrary) TOP value. The DMVs all viewed these are separate queries, so it was not aggregating the stats. It also wasn’t reusing the plan (thus chewing up even more CPU from frequent compiles). If the TOP had not been there, or it had been passed as a parameter, my initial query of sys.dm_exec_query_stats should have found it.

There are a couple of issues Andy works through, and his advice is good:  just because something runs quickly doesn’t mean it can’t (in aggregate) have a negative effect on your server.

Views Aren’t Tables

Grant Fritchey dives into the problem with nested views:

That’s a query against the XML stored in the ProductModel table. The view was created to mask the complexity of the necessary XPath code, while providing a mechanism for retrieving the data from the XML. This is a common use of views. However, when we then treat the view as a table, and join it to other tables and views, we present a problem for the optimizer. Because a view is not a table, but is instead a query, the optimizer has to resolve this query in combination with any other views or tables to arrive at an execution plan for the whole combined mess. While the optimizer is very good at what it does, because of the complexity caused by the additional unnecessary processing to figure out which parts of the view is not needed to satisfy the query, it can make poor choices. That can result in poor performance.

“Poor performance” can be an understatement.

Getting The Last Row Per Group

Daniel Hutmacher wants to get the last element in each group (for example, the current records in a type-two dimension):

The CROSS APPLY and the old-school solutions are by far the best choice for dense indexes, i.e. when the first column has a low degree of uniqueness. The old-school solution is only that fast because the optimizer short-circuits the query plan.

LEAD() and the old school strategy are best for selective indexes, i.e. when the first column is highly unique.

There’s a nice set of options available so if one doesn’t work well with your particular data set, try out some of the others and see if they work for you.

Clearing The Azure Procedure Cache

Tim Radney shows us a new way of clearing the procedure cache in Azure SQL Databases (and in 2016 RC0 or later):

It turns out that DBCC FREEPROCCACHE is not supported in Azure SQL Database. This was troubling to me, what if I’m in production and have some bad plans and want to clear the procedure cache like I can with the box version. A little Google/Bing research lead me to find the Microsoft article, “Understanding the Procedure Cache on SQL Azure,” which states:

SQL Azure currently doesn’t support DBCC FREEPROCCACHE (Transact-SQL), so you cannot manually remove an execution plan from the cache.  However, if you make changes to the table or view referenced by the query (ALTER TABLE and ALTER VIEW) the plan will be removed from the cache.

In discussing this with Kimberly Tripp after not seeing that described behavior, it does not flush the plan from cache, but it does invalidate the plan (and then the plan will be eventually aged out of the cache). While this is helpful in certain situations, this was not what I needed. For my demo I wanted to reset the counters in sys.dm_exec_cached_plans. Generating a new plan would not give me the desired results. I reached out to my team and Glenn Berry told me to try the following script:

Read on for the new command, and just like DBCC FREEPROCCACHE, be careful where you point that thing.

Parse Query Plans

Richie Lee writes some C# code to parse query plans:

XPath is the bane of my life… it takes a while to find the correct value particularly as there are so many node names that are re-used yet embedded into them. So I added a simple example and a couple with more depth. Nevertheless, running the test should produce a green result. This could be used for more than just testing; DBA’s may find it useful in SQLCLR scenarios.

I’ve uploaded the code to GitHub. The repository is calledXQueryPlanPath.

9/10 would prefer F#.

Seriously, though, this is a nice start if you need to dig into execution plans programmatically.

TOP 1 Tuning

Kenneth Fisher has a good case study on tuning with the help of Rob Farley:

Here we are looking at the difference between the estimated and actual number of rows for an element of the plan. To look at this information you can either mouse over the element or right click and open the properties tab. In this case you will see that the estimated number of rows (what the optimizer thought would happen) is fairly low (117) particularly compared to what actually happened (1494900). When you see a big difference like that in a query plan there is something wrong.

This is a really nice and detailed walkthrough in which Rob plays Socrates and Kenneth your favorite of the group (Thrasymachus anyone?).

Faking Reads And Writes

Kendra Little shows us how to how to fake reads and writes:

Trainers and speakers need the code they write to be predictable, re-runnable, and as fast as possible. Faking writes can be useful for speakers and teachers who want to be able to generate some statistics in SQL Server’s index dynamic management views or get some query execution plans into cache. The “faking” bit makes the code re-runnable, and usually a bit faster. For writes, it also reduces the risk of filling up your transaction log.

I didn’t invent either of the techniques used below. Both patterns are very common and generic, and so simple that no origin is known.

This isn’t applicable to everyone, but if you’re giving a presentation and want to simulate data access, these are good techniques.


July 2017
« Jun