Press "Enter" to skip to content

Category: Internals

Whither Code Page 1

Solomon Rutzky unravels a mystery:

Collation names, at least in SQL Server, are structured in that they contain various pieces of information detailing some of the properties of the collation. The collation name of SQL_Latin1_General_CP1_CI_AS can be broken down as follows:

SQL_ This indicates that the collation is a SQL Server collation, while names without this prefix indicate Windows collations

Latin1_General This is the culture (a.k.a. locale) used for the sorting and comparison rules

CP1 This is the 8-bit [C]ode [P]age

CI This indicates [C]ase- [S]ensitive or [I]nsensitive

AS This indicates [A]ccent- [S]ensitive or [I]nsensitive

Ok, but there’s no code page “1”. So, what’s going on there?

Read on to learn the answer behind this mystery.

Comments closed

Binary and Hexadecimal in Brief

Randolph West has a primer for us:

The CPU can only work with small amounts of data at a time. When the CPU is done with that data, it puts it in memory. Depending on how that data is managed (the on-or-off bit of information is literally called a bit because computer scientists are great at naming things), it usually forms part of a byte, which by convention is eight bits. In other words, you get eight bits of information per byte, represented by a series of 1s (electricity flowed for a fixed period of time) and 0s (no electricity flowed for that fixed period of time).

The uppercase “A” in the Latin alphabet, using the ASCII standard, is represented by the byte 01000001.

So, cool, this is binary. What does it actually mean? 

Read on for more.

Comments closed

Finding Index Fragmentation

Deepthi Goguri is hunting the most dangerous predator:

The bad page splits are the splits that we learned in the previous post which is the split that occurs when a random insert has to happen and there is no space on page, a new page gets created during the page split. These page splits are very expensive causing the fragmentation. Good page splits occurs when the append only inserts happen as the pages gets filled on the index pages to the right side of the index and new pages gets added as they gets filled up to the right side of the index. These types of good page splits doesn’t cause any index fragmentation. SQL Server will group these two types of page splits together and do not differentiate between them. So, how do we know to differentiate between the good and the bad page splits? Let’s learn more about this.

It is very difficult to differentiate these page splits by using the existing methods we have in the SQL Server like using the perfmon counter which has the pagesplits/sec counter. This counter will give the good and the nasty page splits together. There is a DMV sys.dm_db_index_operational_stats and an extended event page_split event to track the page splits.

Read on to see how we can find those undesirable page splits versus the benign ones.

Comments closed

The Mechanisms of Page Splits

Deepthi Gogrui takes us through page splitting in SQL Server:

Page splits can happen with the logical fragmentation and low page density causing the transactional log to be huge. Page splits are very expensive. As we learned from the previous post, page splits happens when SQL Server tries to insert a row and there is no more space on the page to fit in the page so page will split the page to give the space to fit it in that record. When SQL Server has to insert any rows in to the pages, first the free space on the page is checked at the header information of the page and if the record is within the mentioned free bytes, the record fits in. If the space is not contiguous within the page but there is total space to fit in the record, then the in-memory compaction of the page making the amount of free space on the page contiguous. This is not page split. Page split occurs when the space is not available on the page to fit in the record, in that case page split into half and the split point of the page is usually 50 percent each. Sometimes, the split can happen at the different point on the page is chosen by the storage engine as the obvious split point which is known as skewed page split. This is even more expensive when compared with the regular page split as this will create much larger transaction log.

What causes the page splits?

Read on for the full article.

Comments closed

Types of Fragmentation on Index Pages in SQL Server

Deepthi Goguri explains what sorts of fragmentation can occur on an index in SQL Server:

Logical Fragmentation occurs when the logical order of the leaf level pages (logical order meaning the next key values in order) no longer the continuous page to the next physical data file page. Because of these pages which are out of order will affect the read ahead mechanism and the scan performance. Because of this logical fragmentation, read ahead have to do smaller read ahead reads.

If the logical fragmentation pages are already in the memory than the read ahead mechanism will not be affected in that case. Logical fragmentation will cause the problem for bigger indexes and not for the smaller ones usually (smaller indexes having pages 1000-5000 pages). You can monitor the amount of the index logical fragmentation by using the DMV sys.dm_db_index_physical_stats.

Read on to learn more about logical fragmentation, as well peers extent fragmentation and low page density.

Comments closed

Doodles about the Storage Engine

Forrest McDaniel explains via image:

Paul Randal is a SQL Server legend with loads of informative articles. But when I was a baby DBA first reading Inside the Storage Engine, I got a little stuck. It took many passes before, eventually, finally, it clicked. I wish I had a lightweight introduction, so in the practice of paying it forward…

Here’s the starting point: sometimes it’s easier to manage lots of small things (say, the 1s and 0s of data) by grouping them into larger things. It’s the same reason you don’t buy rice by the grain.

Read on for that introduction to the storage engine.

Comments closed

Row Goals and Query Optimizer Estimates

Hugo Kornelis explains how row goals can change query optimizer behavior:

The most simple is when the query literally tells SQL Server that you don’t want to have all rows returned. Everyone knows the TOP clause, which is most commonly used for this. For ANSI portability, and because it adds a few options, you should also be aware of the FETCH and OFFSET modifiers to the ORDER BY clause, that have a similar functionality and are specifically designed to support paging. And there is of course the SET ROWCOUNT option, though I sincerely hope nobody actually uses that. All of these options literally tell SQL Server that we don’t want all results, only a part of them. The execution plan that would produce the entire set the fastest might not necessarily be the fastest way to get the few rows we actually want, so it’s a good thing that the optimizer has a way to come up with a different execution plan for these cases.

But there are plenty of other ways you might get a row goal, so check them out.

Comments closed

Another Batch of Common Query Plan Patterns

Erik Darling continues pulling one-star query plan patterns. First up is common subexpression spools:

The first time I heard the term “common subexpression spool” my brain went numb for a week.

It’s not a particularly mellifluous phrase, but it is helpful to understand what it is.

One easy way to think about it is a temporary cache for the result of a query, like a temp table inside your execution plan:

Then we look at optimizer choices around sorting:

Sometimes these Sorts are harmless, and sometimes they’re not. There are many situational things about the queries, indexes, available resources, and query plan appropriateness that will lead you to treating things differently.

Parameter sniffing, cardinality estimate accuracy, query concurrency, and physical memory are all potential reasons for these choices going great or going grog.

Read both of the whole things.

Comments closed

Inserted and Deleted Scans

Hugo Kornelis has a two-parter for us. First up is the inserted scan operator:

The Inserted Scan operator is only found in execution plans for code in triggers. It is used for queries that read data from the inserted pseudo-table. Its counterpart, Deleted Scan, reads from the deleted pseudo-table.

This pseudo-table contains a copy of all the rows that were inserted in AFTER INSERT triggers, or the new content of the data in all affected rows in AFTER UPDATE triggers. In INSTEAD OF INSERT or INSTEAD OF UPDATE triggers, the data in the inserted pseudo-table is the data that would have been inserted, or the data as it would have been after the update. In AFTER DELETE and INSTEAD OF DELETE triggers, using the inserted pseudo-table is allowed but returns no data.

Then, the deleted scan:

This pseudo-table contains a copy of all rows that has just been deleted in AFTER DELETE triggers, or the original data of all affected rows in AFTER UPDATE triggers. In INSTEAD OF DELETE or INSTEAD OF UPDATE triggers, the data in the deleted pseudo-table is the current data in the rows that would have been deleted or updated. In AFTER INSERT and INSTEAD OF INSERT triggers, using the deleted pseudo-table is allowed but returns no data.

Click through to see how they work.

Comments closed