Press "Enter" to skip to content

Category: Internals

Row Goals and Query Optimizer Estimates

Hugo Kornelis explains how row goals can change query optimizer behavior:

The most simple is when the query literally tells SQL Server that you don’t want to have all rows returned. Everyone knows the TOP clause, which is most commonly used for this. For ANSI portability, and because it adds a few options, you should also be aware of the FETCH and OFFSET modifiers to the ORDER BY clause, that have a similar functionality and are specifically designed to support paging. And there is of course the SET ROWCOUNT option, though I sincerely hope nobody actually uses that. All of these options literally tell SQL Server that we don’t want all results, only a part of them. The execution plan that would produce the entire set the fastest might not necessarily be the fastest way to get the few rows we actually want, so it’s a good thing that the optimizer has a way to come up with a different execution plan for these cases.

But there are plenty of other ways you might get a row goal, so check them out.

Comments closed

Another Batch of Common Query Plan Patterns

Erik Darling continues pulling one-star query plan patterns. First up is common subexpression spools:

The first time I heard the term “common subexpression spool” my brain went numb for a week.

It’s not a particularly mellifluous phrase, but it is helpful to understand what it is.

One easy way to think about it is a temporary cache for the result of a query, like a temp table inside your execution plan:

Then we look at optimizer choices around sorting:

Sometimes these Sorts are harmless, and sometimes they’re not. There are many situational things about the queries, indexes, available resources, and query plan appropriateness that will lead you to treating things differently.

Parameter sniffing, cardinality estimate accuracy, query concurrency, and physical memory are all potential reasons for these choices going great or going grog.

Read both of the whole things.

Comments closed

Inserted and Deleted Scans

Hugo Kornelis has a two-parter for us. First up is the inserted scan operator:

The Inserted Scan operator is only found in execution plans for code in triggers. It is used for queries that read data from the inserted pseudo-table. Its counterpart, Deleted Scan, reads from the deleted pseudo-table.

This pseudo-table contains a copy of all the rows that were inserted in AFTER INSERT triggers, or the new content of the data in all affected rows in AFTER UPDATE triggers. In INSTEAD OF INSERT or INSTEAD OF UPDATE triggers, the data in the inserted pseudo-table is the data that would have been inserted, or the data as it would have been after the update. In AFTER DELETE and INSTEAD OF DELETE triggers, using the inserted pseudo-table is allowed but returns no data.

Then, the deleted scan:

This pseudo-table contains a copy of all rows that has just been deleted in AFTER DELETE triggers, or the original data of all affected rows in AFTER UPDATE triggers. In INSTEAD OF DELETE or INSTEAD OF UPDATE triggers, the data in the deleted pseudo-table is the current data in the rows that would have been deleted or updated. In AFTER INSERT and INSTEAD OF INSERT triggers, using the deleted pseudo-table is allowed but returns no data.

Click through to see how they work.

Comments closed

Page and Data Row Structure in SQL Server

Deepthi Goguri digs into data page internals:

Each byte you see in the picture has a purpose. The first two blocks containing a byte, Status Bit A and Status Bit B contains the bitmap information about the row, like if the row is logically been deleted/ghosted, row type information, versioning tag, if the row contains any NULL values, Variable length columns. The next 2 bytes is used for storing the length of the fixed length data. The next n bytes are for storing the fixed length data itself. There is a null bitmap after that which will have both the 2-byte column count in the row and null bitmap array. Regardless of if the column in null or not, each and every column will have one bit per every column.

Read on to see how those 8kb pages fill up so quickly.

Comments closed

Describing the Physical Join Operators

Deepthi Goguri explains the three physical join operators in SQL Server:

Merge join is not a bad thing and it may be efficient already in your execution plan. What you have to observe when you see the merge joins and performance slow on that plan is to focus on the upstream operations that are going into the merge join. Whether the data is presorted as you already have an index or whether the data is presorted in SQL Server own way then in that case, you can simply check if you can just add that missing column in the index and place in the last key column in the index or use a different join algorithm will be better. The other scenario might be you have lots of duplicate values in your data. If that is the case SQL Server will be using the work tables to handle how the duplicate values can be joined on. So, if you see the duplicate values or using tempdb, then finding the better options will be good.

Click through for more detail. Each physical operator has its place and does quite well within it, but the challenge comes when the optimizer thinks a particular route is better than the one you had in mind.

Comments closed

Window Functions in Row and Batch Modes

Erik Darling digs into a new series:

To start things off, we’re going to talk about query plan patterns related to windowing functions.

There are several things to consider with windowing function query plans:

– Row vs Batch mode
– With and Without Partition By
– Index Support for Partition and Order By
– Column SELECTion
– Rows vs Range/Global aggregates

We’ll get to them in separate posts, because there are particulars about them that would make covering them all in a single post unwieldy.

Anyway, the first one is pretty simple, and starting simple is about my speed.

Read on for this quick coverage of row mode versus batch mode processing with respect to window functions.

Comments closed

Object ID 99

Paul Randal talks us through a special object ID:

Earlier today I was answering a question on the #sqlhelp tag on Twitter and I mentioned using object ID 99 as a value, because SQL Server will never set a table to be object ID 99. And so I thought it would make a good topic for a quick blog post.

So what is object ID 99?

Read on for the answer.

Comments closed

The Importance of LSNs to SQL Server

Jack Vamvas explains a concept:

I was talking to an Auditor recently – who specialises in large Corporate Audits – and they asked me how would I prove a certain database which is backed up is actually restored to another server.  One of the methods I described was using the Log Sequence Numbers (LSN).     

Read on for an explanation of how they work and how you can use LSNs to solve that auditing issue.

Comments closed

Fun with Disabling Joins

David Alcock gives us a “wouldn’t it be neat to see this?” scenario:

If you’ve ever seen my session on Guillotines and the Query Optimiser I demonstrated how we can use query hints to switch off certain transformation rules in a query. Transformation rules use a pattern substitution which essentially swaps one expression for another. In SQL Server terms the best example of this is a join as although we use the logical expressions such as left join or inner join the physical join type used in the execution plan operator is substituted to be something like a Hash Match or Nested Loop Join.

This means we can force the behaviour of an execution plan by disabling certain rules. Now I can’t think of any practical reason for this and remember we can also use join hints in our queries but this is different as it affects the entire optimisation process for a query and not just one join so please follow in a sandbox environment.

In case you haven’t seen David’s session, it’s available on YouTube.

Comments closed

Execution Plan Operator Timings

Paul White dives into a murky problem:

SQL Server 2014 SP2 and later produce runtime (“actual”) execution plans that can include elapsed time and CPU usage for each execution plan operator (see KB3170113 and this blog post by Pedro Lopes).

Interpreting these numbers is not always as straightforward as one might expect. There are important differences between row mode and batch mode execution, as well as tricky issues with row mode parallelism. SQL Server makes some timing adjustments in parallel plans to promote consistency, but they are not perfectly implemented. This can make it difficult to draw sound performance-tuning conclusions.

This article aims to help you understand where the timings come from in each case, and how they can be best interpreted in context.

My default advice is to read what Paul White writes, and this is no exception.

Comments closed