Press "Enter" to skip to content

Category: Internals

Binary and Hexadecimal in Brief

Randolph West has a primer for us:

The CPU can only work with small amounts of data at a time. When the CPU is done with that data, it puts it in memory. Depending on how that data is managed (the on-or-off bit of information is literally called a bit because computer scientists are great at naming things), it usually forms part of a byte, which by convention is eight bits. In other words, you get eight bits of information per byte, represented by a series of 1s (electricity flowed for a fixed period of time) and 0s (no electricity flowed for that fixed period of time).

The uppercase “A” in the Latin alphabet, using the ASCII standard, is represented by the byte 01000001.

So, cool, this is binary. What does it actually mean? 

Read on for more.

Comments closed

Finding Index Fragmentation

Deepthi Goguri is hunting the most dangerous predator:

The bad page splits are the splits that we learned in the previous post which is the split that occurs when a random insert has to happen and there is no space on page, a new page gets created during the page split. These page splits are very expensive causing the fragmentation. Good page splits occurs when the append only inserts happen as the pages gets filled on the index pages to the right side of the index and new pages gets added as they gets filled up to the right side of the index. These types of good page splits doesn’t cause any index fragmentation. SQL Server will group these two types of page splits together and do not differentiate between them. So, how do we know to differentiate between the good and the bad page splits? Let’s learn more about this.

It is very difficult to differentiate these page splits by using the existing methods we have in the SQL Server like using the perfmon counter which has the pagesplits/sec counter. This counter will give the good and the nasty page splits together. There is a DMV sys.dm_db_index_operational_stats and an extended event page_split event to track the page splits.

Read on to see how we can find those undesirable page splits versus the benign ones.

Comments closed

The Mechanisms of Page Splits

Deepthi Gogrui takes us through page splitting in SQL Server:

Page splits can happen with the logical fragmentation and low page density causing the transactional log to be huge. Page splits are very expensive. As we learned from the previous post, page splits happens when SQL Server tries to insert a row and there is no more space on the page to fit in the page so page will split the page to give the space to fit it in that record. When SQL Server has to insert any rows in to the pages, first the free space on the page is checked at the header information of the page and if the record is within the mentioned free bytes, the record fits in. If the space is not contiguous within the page but there is total space to fit in the record, then the in-memory compaction of the page making the amount of free space on the page contiguous. This is not page split. Page split occurs when the space is not available on the page to fit in the record, in that case page split into half and the split point of the page is usually 50 percent each. Sometimes, the split can happen at the different point on the page is chosen by the storage engine as the obvious split point which is known as skewed page split. This is even more expensive when compared with the regular page split as this will create much larger transaction log.

What causes the page splits?

Read on for the full article.

Comments closed

Types of Fragmentation on Index Pages in SQL Server

Deepthi Goguri explains what sorts of fragmentation can occur on an index in SQL Server:

Logical Fragmentation occurs when the logical order of the leaf level pages (logical order meaning the next key values in order) no longer the continuous page to the next physical data file page. Because of these pages which are out of order will affect the read ahead mechanism and the scan performance. Because of this logical fragmentation, read ahead have to do smaller read ahead reads.

If the logical fragmentation pages are already in the memory than the read ahead mechanism will not be affected in that case. Logical fragmentation will cause the problem for bigger indexes and not for the smaller ones usually (smaller indexes having pages 1000-5000 pages). You can monitor the amount of the index logical fragmentation by using the DMV sys.dm_db_index_physical_stats.

Read on to learn more about logical fragmentation, as well peers extent fragmentation and low page density.

Comments closed

Doodles about the Storage Engine

Forrest McDaniel explains via image:

Paul Randal is a SQL Server legend with loads of informative articles. But when I was a baby DBA first reading Inside the Storage Engine, I got a little stuck. It took many passes before, eventually, finally, it clicked. I wish I had a lightweight introduction, so in the practice of paying it forward…

Here’s the starting point: sometimes it’s easier to manage lots of small things (say, the 1s and 0s of data) by grouping them into larger things. It’s the same reason you don’t buy rice by the grain.

Read on for that introduction to the storage engine.

Comments closed

Row Goals and Query Optimizer Estimates

Hugo Kornelis explains how row goals can change query optimizer behavior:

The most simple is when the query literally tells SQL Server that you don’t want to have all rows returned. Everyone knows the TOP clause, which is most commonly used for this. For ANSI portability, and because it adds a few options, you should also be aware of the FETCH and OFFSET modifiers to the ORDER BY clause, that have a similar functionality and are specifically designed to support paging. And there is of course the SET ROWCOUNT option, though I sincerely hope nobody actually uses that. All of these options literally tell SQL Server that we don’t want all results, only a part of them. The execution plan that would produce the entire set the fastest might not necessarily be the fastest way to get the few rows we actually want, so it’s a good thing that the optimizer has a way to come up with a different execution plan for these cases.

But there are plenty of other ways you might get a row goal, so check them out.

Comments closed

Another Batch of Common Query Plan Patterns

Erik Darling continues pulling one-star query plan patterns. First up is common subexpression spools:

The first time I heard the term “common subexpression spool” my brain went numb for a week.

It’s not a particularly mellifluous phrase, but it is helpful to understand what it is.

One easy way to think about it is a temporary cache for the result of a query, like a temp table inside your execution plan:

Then we look at optimizer choices around sorting:

Sometimes these Sorts are harmless, and sometimes they’re not. There are many situational things about the queries, indexes, available resources, and query plan appropriateness that will lead you to treating things differently.

Parameter sniffing, cardinality estimate accuracy, query concurrency, and physical memory are all potential reasons for these choices going great or going grog.

Read both of the whole things.

Comments closed

Inserted and Deleted Scans

Hugo Kornelis has a two-parter for us. First up is the inserted scan operator:

The Inserted Scan operator is only found in execution plans for code in triggers. It is used for queries that read data from the inserted pseudo-table. Its counterpart, Deleted Scan, reads from the deleted pseudo-table.

This pseudo-table contains a copy of all the rows that were inserted in AFTER INSERT triggers, or the new content of the data in all affected rows in AFTER UPDATE triggers. In INSTEAD OF INSERT or INSTEAD OF UPDATE triggers, the data in the inserted pseudo-table is the data that would have been inserted, or the data as it would have been after the update. In AFTER DELETE and INSTEAD OF DELETE triggers, using the inserted pseudo-table is allowed but returns no data.

Then, the deleted scan:

This pseudo-table contains a copy of all rows that has just been deleted in AFTER DELETE triggers, or the original data of all affected rows in AFTER UPDATE triggers. In INSTEAD OF DELETE or INSTEAD OF UPDATE triggers, the data in the deleted pseudo-table is the current data in the rows that would have been deleted or updated. In AFTER INSERT and INSTEAD OF INSERT triggers, using the deleted pseudo-table is allowed but returns no data.

Click through to see how they work.

Comments closed

Page and Data Row Structure in SQL Server

Deepthi Goguri digs into data page internals:

Each byte you see in the picture has a purpose. The first two blocks containing a byte, Status Bit A and Status Bit B contains the bitmap information about the row, like if the row is logically been deleted/ghosted, row type information, versioning tag, if the row contains any NULL values, Variable length columns. The next 2 bytes is used for storing the length of the fixed length data. The next n bytes are for storing the fixed length data itself. There is a null bitmap after that which will have both the 2-byte column count in the row and null bitmap array. Regardless of if the column in null or not, each and every column will have one bit per every column.

Read on to see how those 8kb pages fill up so quickly.

Comments closed

Describing the Physical Join Operators

Deepthi Goguri explains the three physical join operators in SQL Server:

Merge join is not a bad thing and it may be efficient already in your execution plan. What you have to observe when you see the merge joins and performance slow on that plan is to focus on the upstream operations that are going into the merge join. Whether the data is presorted as you already have an index or whether the data is presorted in SQL Server own way then in that case, you can simply check if you can just add that missing column in the index and place in the last key column in the index or use a different join algorithm will be better. The other scenario might be you have lots of duplicate values in your data. If that is the case SQL Server will be using the work tables to handle how the duplicate values can be joined on. So, if you see the duplicate values or using tempdb, then finding the better options will be good.

Click through for more detail. Each physical operator has its place and does quite well within it, but the challenge comes when the optimizer thinks a particular route is better than the one you had in mind.

Comments closed