Press "Enter" to skip to content

Category: Performance Tuning

The Story Behind Grant’s Execution Plan Book

Grant Fritchey shares a funny (in a Schadenfreude sense) look at what it took to get SQL Server Execution Plans, 3rd Edition out:

Four years ago, after a bunch of dithering and some negotiations with Tony Davis, my editor, I started to update my book, SQL Server Execution Plans. We managed to convince Hugo Kornelis to be the tech editor. I started to do the real writing in early 2015.

I was most of the way through a first draft and no one liked it. Tony was unhappy. Hugo was unhappy. I was unhappy. I was just trying to update the existing book, SQL Server Execution Plans. It wasn’t working.

We all came to the conclusion that the old book was wrong. Not simply in a technical sense, although there was a lot of that, but in a structural sense. So we started rearranging things. SQL Server 2014 came out, but I was ready for it, having been on a preview, so it was no big deal. We started a second draft.

The book Grant mentions is free in PDF format thanks to Red Gate, so go give it a download and enjoy the fruits of Grant’s labor.

Comments closed

A Look At Automatic Plan Tuning In SQL Server 2017

John Sterrett shows us how to review automatic plan tuning suggestions:

SQL Server 2017 Automatic Tuning looks for queries where execution plans change and performance regresses. This feature depends on Query Store being enabled. Note, even if you don’t turn on Automatic Tuning you still get the benefits of having access to the data. That is right. Automatic Tuning would tell you what it would do if it was enabled.  Think of this as free performance tuning training.  Go look at the DMVs and try to understand why the optimizer would want to lock in an execution plan. We will actually go through a real-world example:

Click through for the example.  I wouldn’t automatically trust these automatic results, but my experience has been generally positive.

Comments closed

Mining The Plan Cache, Query Store, And More

Erin Stellato shows the benefit of digging through the plan cache, Query Store, and third-party performance tool databases (using SentryOne’s SQL Sentry as an example):

As much as I love all this extra data, it’s important to note that some information is more relevant for an actual execution plan, versus an estimated one (e.g. tempdb spill information). Some days we can capture and use the actual plan for troubleshooting, other times we have to use the estimated plan. Very often we get that estimated plan – the plan that has been used for problematic executions potentially – from SQL Server’s plan cache. And pulling individual plans is appropriate when tuning a specific query or set or queries. But what about when you want ideas on where to focus your tuning efforts in terms of patterns?

The SQL Server plan cache is a prodigious source of information when it comes to performance tuning, and I don’t simply mean troubleshooting and trying to understand what’s been running in a system. In this case, I’m talking about mining information from the plans themselves, which are found in sys.dm_exec_query_plan, stored as XML in the query_plan column.

When you combine this data with information from sys.dm_exec_sql_text (so you can easily view the text of the query) and sys.dm_exec_query_stats (execution statistics), you can suddenly start to look for not just those queries that are the heavy hitters or execute most frequently, but those plans that contain a particular join type, or index scan, or those that have the highest cost. This is commonly referred to as mining the plan cache, and there are several blog posts that talk about how to do this. My colleague, Jonathan Kehayias, says he hates to write XML yet he has several posts with queries for mining the plan cache:

It’s a good article with a lot of useful information.

Comments closed

Power BI August Release And SSAS Performance Improvements

Chris Webb points out something new in the Power BI August 2018 release:

While I was playing around with the new release (August 2018) of Power BI Desktop I noticed there was an undocumented change: similar to the OData improvements I blogged about here, there is a new option in the AnalysisServices.Database() and AnalysisServices.Databases() M functions that turns on a newer version of the MDX generation layer used by the Power Query engine. Like the OData improvements it is an option called Implementation=”2.0”, used like this:

AnalysisServices.Databases(
	"localhost", 
	[
		TypedMeasureColumns=true, 
		Implementation="2.0"
	]
)

…and also, as with the OData improvements, you will need to manually edit any existing M queries to take advantage of this.

Read on for Chris’s test and analysis of the resulting MDX output.

Comments closed

In Defense Of Inline Table-Valued Functions

Riley Major defends the honor of inline table-valued functions:

So no, user-defined functions are not the devil. Scalar user-defined functions can cause big problems if misused, but generally inline user-defined functions do not cause problems.

The real rule of thumb is not to avoid functions, but rather to avoid adorning your index fields with logic or functions. Because when you hide your intentions from the optimizer with complex syntax, you risk not getting the better performing index seek.

Riley shows an example where his inline table-valued UDF was just as efficient an execution plan as without the UDF.

Comments closed

CXCONSUMER Not Entirely Harmless

Erik Darling points out a case where CXCONSUMER waits are important:

In this sample there are absolutely no waits whatsoever on CXPACKET.

They are nonexistent.

If you were hoping to find out that they were way crazy out of control call a priest and ditch your bar tab we’re driving on the train tracks, you’ll be awfully disappointed.

There just aren’t any.

There’s only one core in use for nearly the entire duration, aside from some blips.

That’s disappointing.  I was hoping to be able to ignore this wait altogether.

Comments closed

TANSTAAQRC (Query Result Cache)

Andy Mallon explains that a query result cache does not exist in SQL Server:

I was recently doing a training session when a developer commented that it was OK to run an expensive query twice because on the second execution, SQL Server would use the “results cache” and be “practically free”. It’s not the first time I’ve heard someone refer to a “results cache” in SQL Server. This is one of those myths that is almost true, which makes it that much more believable. If you don’t know better, you might think SQL Server has a “results cache” because the second execution of a query is often faster.

SQL Server does not have a “results cache” and the second execution is not “practically free.”
SQL Server does have a “buffer cache” and the second execution is “faster, but not free.”

The SQL Server buffer cache holds data pages in memory, in the exact form that they reside on disk. The second execution will not have to perform physical I/O operations to satisfy the query, because it can use the buffer cache. However, it does have to perform all other operations. Think of it like this: the second execution still executes the entire execution plan, including all the expensive operations. It is faster, but not “practically free.”

Read the comments for Erik Darling’s plot twist.

Comments closed

When Wait Stats Aren’t Enough

Joe Obbish has an example of diagnosing performance problems when wait stats don’t indicate any problems:

In summary, page allocations and page free events rapidly occur, sometimes in an alternating pattern. SQL Server will often free a number of pages just to immediately request allocations for a similar number of pages. If all of the free page events result in returned memory to the OS then the reason for the scalability bottleneck becomes clear. When running the full workaround with 96 concurrent sessions, a total of 341965 page freed operations were performed. Those events freed about 71.3 million pages in total. That amounts to about 584 GB of memory returned to the OS in total, based on the previous assumptions.

This is a great investigation into the depths of debugging in SQL Server.  Joe wasn’t able to get a definitive solution to his problem, but he showed us a lot along the way.

Comments closed

Nested Loops, Hash, Or Merge: Which Is Best?

Grant Fritchey dodges the important questions:

First response, also a joke, was the question at the title of this post:

What is the preferred operator when joining tables: Hash Match, Nested Loops or Merge?

While my immediate response to this question is, yes. Meaning, they’re all preferred, situationally. I decided to expand on that a bit.

I completely agree with Grant:  there is no single best operator.  If there were, database companies wouldn’t have multiple options.  That said, in an ideal world, all joins would be merge joins; in our fallen world, nested loops and hash matches often prove superior second-best alternatives.

Comments closed

Parallelism Strategies For Grouping Operations

Itzik Ben-Gan continues his series on grouping data in SQL Server by looking at how these operations can go parallel:

Besides needing to choose between various grouping and aggregation strategies (preordered Stream Aggregate, Sort + Stream Aggregate, Hash Aggregate), SQL Server also needs to choose whether to go with a serial or a parallel plan. In fact, it can choose between multiple different parallelism strategies. SQL Server uses costing logic that results in optimization thresholds that under different conditions make one strategy preferred to the others. We’ve already discussed in depth the costing logic that SQL Server uses in serial plans in the previous parts of the series. In this section I’ll introduce a number of parallelism strategies that SQL Server can use for handling grouping and aggregation. Initially, I won’t get into the details of the costing logic, rather just describe the available options. Later in the article I’ll explain how the costing formulas work, and an important factor in those formulas called DOP for costing.

As you will later learn, SQL Server takes into account the number of logical CPUs in the machine in its costing formulas for parallel plans. In my examples, unless I say otherwise, I assume the target system has 8 logical CPUs. If you want to try out the examples that I’ll provide, in order to get the same plans and costing values like I do, you need to run the code on a machine with 8 logical CPUs as well. If you’re machine happens to have a different number of CPUs, you can emulate a machine with 8 CPUs—for costing purposes—like so:

DBCC OPTIMIZER_WHATIF(CPUs, 8);

Even though this tool is not officially documented and supported, it’s quite convenient for research and learning purposes.

This is a fairly long article, but a great one.

Comments closed