Press "Enter" to skip to content

Category: Internals

The Transaction Log Architecture

Paul Randal continues a series on the transaction log:

In the first part of this series, I introduced basic terminology around logging, so I recommend you read that before continuing with this post. Everything else I’ll cover in the series requires knowing some of the architecture of the transaction log, so that’s what I’m going to discuss this time. Even if you’re not going to follow the series, some of the concepts I’m going to explain below are worth knowing for everyday tasks DBAs handle in production.

Read on to learn more about some key transaction log terminology.

Comments closed

Understanding the Transaction Log

Paul Randal has a new series:

With this post, I’m starting an occasional series on the transaction log and how it works and should be managed, and I’ll touch on all the problems above over its course. In this post, I’ll explain what logging is and why it’s required.

Basic Terminology Around Logging

When I’m talking about any mechanism in SQL Server, I find there’s a chicken-and-egg problem where I need to use a word or phrase before I’ve explained it. To avoid that problem in this series, I’m going to start by explaining some terminology that needs to be used when discussing logging, and I’ll expand on many of these terms as the series progresses.

This post starts off with some of the basics and it’s always good to get the occasional refresher on the basics.

Comments closed

Understanding SQL Server I/O Size

Anthony Nocentino dives into a topic:

Master File Table (MFT) is the data structure that describes files and directories on NTFS. In Figure 1, you can see an MTF record has several sections describing the metadata about the file and pointers to blocks that make up the file. A block, also referred to as a cluster in Windows, is an abstraction over one or more physical structures (sectors or pages depending on the media) presented by the underlying disk. A block/cluster is also the atomic allocation unit from a file system and has a configurable size. On NTFS, this is referred to as the NTFS Allocation Unit Size and is a configurable attribute of the file system. By default, it is 4KB and can be as large as 2MB. Since a block is a unit of allocation, if a file is between 1 byte and the file system’s allocation unit size, it will take up exactly one block/cluster on the file system. As the file grows, more blocks/clusters are allocated to represent the file. The MFT data structure tracks which blocks make up a file. The block allocator of the file system will try to ensure blocks are physically adjacent on the disk and groups them together in runs.

A 4KB NTFS Allocation Units size is considered best practice on general-purpose file systems. And 64KB is considered best practice for SQL Server…but why? Let’s keep digging…

I was sitting at a lunch table with Anthony when it all clicked and that was fun to see.

Comments closed

Why the Optimizer Doesn’t Look at Buffer Pool Data

Paul Randal has an explanation for us:

SQL Server has a cost-based optimizer that uses knowledge about the various tables involved in a query to produce what it decides is the most optimal plan in the time available to it during compilation. This knowledge includes whatever indexes exist and their sizes and whatever column statistics exist. Part of what goes into finding the optimal query plan is trying to minimize the number of physical reads needed during plan execution.

One thing I’ve been asked a few times is why the optimizer doesn’t consider what’s in the SQL Server buffer pool when compiling a query plan, as surely that could make a query execute faster. In this post, I’ll explain why.

This is an interesting post because it explains why the developers of the database engine would purposefully ignore something that could make things faster, but at a potentially devastating cost.

Comments closed

Diagnosing Stalled Dispatchers

Sean Gallardy has another memory dump to investigate:

If you’ve been following with the series of demystifying dumps then you’re probably thinking or have thought something along the lines of, “All of these issues seem to be around stalled or non-yielding things…” and you’d be correct. If you revisit the non-yielding scheduler dump post, I explain a little about cooperative scheduling and how things need to place nice, if one piece isn’t playing nice by doing its part in a timely manner or not giving up time to others, that’s going to be a very large problem for the system in general. The stalled dispatcher falls into the same set of issues.

There are various dispatcher pools in SQL Server, though the most widely known one is for extended events. Dispatcher pools are items that hold pieces of work to be completed, like a dumping ground for “someone pick this up and do this” type of tasks. If it helps, you can think of them in a similar way as connection pooling, where there are background threads which will do something (like making or updating connections) when needed. It can respond to pressure by making new threads if needed or trim threads when the work slows down. In SQL Server dispatchers set their own attributes including when an item is considered stuck, the number of threads to use, etc., which means some may never be considered stalled as they are able to disable their own timeouts.

Read on to see what might cause a stalled dispatcher and what a memory dump for one can tell us.

Comments closed

Intent Shared Locks in RCSI

Tomas Zika troubleshoots a deadlocking problem:

I was analyzing a deadlock graph and there was a mystery lock of type IS (Intent Shared). That was weird by itself because the database has Read Committed Snapshot Isolation (RCSI) enabled, which is the Optimistic Concurrency model that shouldn’t take shared locks. All the statements were contained in this database. Also, the locked table was seemingly unrelated to anything that has been going on in the deadlock report.

Click through for an image, a repro script, and an explanation as to what exactly is going on.

Comments closed

Non-Yielding IO Completion Ports

Sean Gallardy is here to demystify a concept:

IO Completion Ports are a set of Windows APIs which allow for efficient, fast, multithreaded asynchronous IO. Great, that pretty much tells you nothing.

SQL Server uses IO Completion Ports not for disk-based IO but for general network IO when it comes into SQL Server for TDS level items. This means it’s used for things such as connecting to an instance of SQL Server, sending batch and rpc information, etc., and is used to properly take actions on the incoming items. These actions should be extremely short and quick, the name of the game is low latency and high throughput which means not doing things like reading or writing from disk, allocating memory, calling functions that may block, etc., to keep things flowing.

Read on to see what happens when there is a problem and what might cause that problem.

Comments closed

Fill Factor and When It Matters

Raul Gonzalez has a confession to make:

I love SQL Server internals, I do and I just said it.

Why? because thanks to all the tools, documentation and community members that share their knowledge, folks like me can understand how a super complex piece of software like a relational database engine works (or at least a small part of it).

Click through for a discussion of fill factor and one area where Raul thinks it falls short. I’m not sure that I agree but would need to think about it to give a clear explanation as to why.

Comments closed

Calculating the Adaptive Join Threshold

Paul White breaks out the math books:

One thing I want you to bear in mind throughout this piece is an adaptive join always starts executing as a batch mode hash join. This is true even if the execution plan indicates the adaptive join expects to run as a row mode apply.

Like any hash join, an adaptive join reads all rows available on its build input and copies the required data into a hash table. The batch mode flavour of hash join stores these rows in an optimized format, and partitions them using one or more hash functions. Once the build input has been consumed, the hash table is fully populated and partitioned, ready for the hash join to start checking probe-side rows for matches.

This is the point where an adaptive join makes the decision to proceed with the batch mode hash join or to transition to a row mode apply. If the number of rows in the hash table is less than the threshold value, the join switches to an apply; otherwise, the join continues as a hash join by starting to read rows from the probe input.

Read the whole thing and learn more about the cutoffs for adaptive joins.

Comments closed