Press "Enter" to skip to content

Category: Internals

The Mechanisms of Page Splits

Deepthi Gogrui takes us through page splitting in SQL Server:

Page splits can happen with the logical fragmentation and low page density causing the transactional log to be huge. Page splits are very expensive. As we learned from the previous post, page splits happens when SQL Server tries to insert a row and there is no more space on the page to fit in the page so page will split the page to give the space to fit it in that record. When SQL Server has to insert any rows in to the pages, first the free space on the page is checked at the header information of the page and if the record is within the mentioned free bytes, the record fits in. If the space is not contiguous within the page but there is total space to fit in the record, then the in-memory compaction of the page making the amount of free space on the page contiguous. This is not page split. Page split occurs when the space is not available on the page to fit in the record, in that case page split into half and the split point of the page is usually 50 percent each. Sometimes, the split can happen at the different point on the page is chosen by the storage engine as the obvious split point which is known as skewed page split. This is even more expensive when compared with the regular page split as this will create much larger transaction log.

What causes the page splits?

Read on for the full article.

Comments closed

Types of Fragmentation on Index Pages in SQL Server

Deepthi Goguri explains what sorts of fragmentation can occur on an index in SQL Server:

Logical Fragmentation occurs when the logical order of the leaf level pages (logical order meaning the next key values in order) no longer the continuous page to the next physical data file page. Because of these pages which are out of order will affect the read ahead mechanism and the scan performance. Because of this logical fragmentation, read ahead have to do smaller read ahead reads.

If the logical fragmentation pages are already in the memory than the read ahead mechanism will not be affected in that case. Logical fragmentation will cause the problem for bigger indexes and not for the smaller ones usually (smaller indexes having pages 1000-5000 pages). You can monitor the amount of the index logical fragmentation by using the DMV sys.dm_db_index_physical_stats.

Read on to learn more about logical fragmentation, as well peers extent fragmentation and low page density.

Comments closed

Doodles about the Storage Engine

Forrest McDaniel explains via image:

Paul Randal is a SQL Server legend with loads of informative articles. But when I was a baby DBA first reading Inside the Storage Engine, I got a little stuck. It took many passes before, eventually, finally, it clicked. I wish I had a lightweight introduction, so in the practice of paying it forward…

Here’s the starting point: sometimes it’s easier to manage lots of small things (say, the 1s and 0s of data) by grouping them into larger things. It’s the same reason you don’t buy rice by the grain.

Read on for that introduction to the storage engine.

Comments closed

Row Goals and Query Optimizer Estimates

Hugo Kornelis explains how row goals can change query optimizer behavior:

The most simple is when the query literally tells SQL Server that you don’t want to have all rows returned. Everyone knows the TOP clause, which is most commonly used for this. For ANSI portability, and because it adds a few options, you should also be aware of the FETCH and OFFSET modifiers to the ORDER BY clause, that have a similar functionality and are specifically designed to support paging. And there is of course the SET ROWCOUNT option, though I sincerely hope nobody actually uses that. All of these options literally tell SQL Server that we don’t want all results, only a part of them. The execution plan that would produce the entire set the fastest might not necessarily be the fastest way to get the few rows we actually want, so it’s a good thing that the optimizer has a way to come up with a different execution plan for these cases.

But there are plenty of other ways you might get a row goal, so check them out.

Comments closed

Another Batch of Common Query Plan Patterns

Erik Darling continues pulling one-star query plan patterns. First up is common subexpression spools:

The first time I heard the term “common subexpression spool” my brain went numb for a week.

It’s not a particularly mellifluous phrase, but it is helpful to understand what it is.

One easy way to think about it is a temporary cache for the result of a query, like a temp table inside your execution plan:

Then we look at optimizer choices around sorting:

Sometimes these Sorts are harmless, and sometimes they’re not. There are many situational things about the queries, indexes, available resources, and query plan appropriateness that will lead you to treating things differently.

Parameter sniffing, cardinality estimate accuracy, query concurrency, and physical memory are all potential reasons for these choices going great or going grog.

Read both of the whole things.

Comments closed

Inserted and Deleted Scans

Hugo Kornelis has a two-parter for us. First up is the inserted scan operator:

The Inserted Scan operator is only found in execution plans for code in triggers. It is used for queries that read data from the inserted pseudo-table. Its counterpart, Deleted Scan, reads from the deleted pseudo-table.

This pseudo-table contains a copy of all the rows that were inserted in AFTER INSERT triggers, or the new content of the data in all affected rows in AFTER UPDATE triggers. In INSTEAD OF INSERT or INSTEAD OF UPDATE triggers, the data in the inserted pseudo-table is the data that would have been inserted, or the data as it would have been after the update. In AFTER DELETE and INSTEAD OF DELETE triggers, using the inserted pseudo-table is allowed but returns no data.

Then, the deleted scan:

This pseudo-table contains a copy of all rows that has just been deleted in AFTER DELETE triggers, or the original data of all affected rows in AFTER UPDATE triggers. In INSTEAD OF DELETE or INSTEAD OF UPDATE triggers, the data in the deleted pseudo-table is the current data in the rows that would have been deleted or updated. In AFTER INSERT and INSTEAD OF INSERT triggers, using the deleted pseudo-table is allowed but returns no data.

Click through to see how they work.

Comments closed

Page and Data Row Structure in SQL Server

Deepthi Goguri digs into data page internals:

Each byte you see in the picture has a purpose. The first two blocks containing a byte, Status Bit A and Status Bit B contains the bitmap information about the row, like if the row is logically been deleted/ghosted, row type information, versioning tag, if the row contains any NULL values, Variable length columns. The next 2 bytes is used for storing the length of the fixed length data. The next n bytes are for storing the fixed length data itself. There is a null bitmap after that which will have both the 2-byte column count in the row and null bitmap array. Regardless of if the column in null or not, each and every column will have one bit per every column.

Read on to see how those 8kb pages fill up so quickly.

Comments closed

Describing the Physical Join Operators

Deepthi Goguri explains the three physical join operators in SQL Server:

Merge join is not a bad thing and it may be efficient already in your execution plan. What you have to observe when you see the merge joins and performance slow on that plan is to focus on the upstream operations that are going into the merge join. Whether the data is presorted as you already have an index or whether the data is presorted in SQL Server own way then in that case, you can simply check if you can just add that missing column in the index and place in the last key column in the index or use a different join algorithm will be better. The other scenario might be you have lots of duplicate values in your data. If that is the case SQL Server will be using the work tables to handle how the duplicate values can be joined on. So, if you see the duplicate values or using tempdb, then finding the better options will be good.

Click through for more detail. Each physical operator has its place and does quite well within it, but the challenge comes when the optimizer thinks a particular route is better than the one you had in mind.

Comments closed

Window Functions in Row and Batch Modes

Erik Darling digs into a new series:

To start things off, we’re going to talk about query plan patterns related to windowing functions.

There are several things to consider with windowing function query plans:

– Row vs Batch mode
– With and Without Partition By
– Index Support for Partition and Order By
– Column SELECTion
– Rows vs Range/Global aggregates

We’ll get to them in separate posts, because there are particulars about them that would make covering them all in a single post unwieldy.

Anyway, the first one is pretty simple, and starting simple is about my speed.

Read on for this quick coverage of row mode versus batch mode processing with respect to window functions.

Comments closed

Object ID 99

Paul Randal talks us through a special object ID:

Earlier today I was answering a question on the #sqlhelp tag on Twitter and I mentioned using object ID 99 as a value, because SQL Server will never set a table to be object ID 99. And so I thought it would make a good topic for a quick blog post.

So what is object ID 99?

Read on for the answer.

Comments closed