Press "Enter" to skip to content

Category: Internals

Understanding the Eager Index Spool

Paul White gives us a lesson on eager spools:

Index spools do not tell the optimizer they support output ordered by the spool’s index keys. If sorted output from the spool is required, you may see an unnecessary Sort operator. Eager index spools should often be replaced by a permanent index anyway, so this is a minor concern much of the time.

There are five optimizer rules that can generate an Eager Index Spool option (known internally as an index on-the-fly). We will look at three of these in detail to understand where eager index spools come from.

Read on for a detailed discussion of eager spools.

Comments closed

Worker Migration in SQL Server 2019

Dong Cao explains an interesting improvement to SQL Server 2019’s internals:

Worker migration (AKA “worker stealing”) allows an idle SOS scheduler to migrate a worker from the runnable queue of another scheduler on the same NUMA node and immediately resume the task of the migrated worker. This enhancement provides more balanced CPU usage and reduces the amount of time long-running tasks spend in the runnable queue.
A long-running task that is enabled for worker migration is no longer bound to a fixed scheduler. Instead, it will frequently move across schedulers within the same NUMA node which naturally results in less loaded schedulers. Together with the existing load factor mechanism, worker migration provides SQL Server with an enriched solution for balanced CPU usage.

Click through to understand where this is particularly useful and what the performance implications are.

Comments closed

Debugging with PerfView

Erik Darling takes us through PerfView:

If you seriously want to debug, you’ve gotta use something like WinDbg. It’s tough, but powerful.

It’s also… Where do you even start? The most common thing to do is set a Break Point.

Figuring out where you wanna set it can be challenging.
– Sometimes you can capture a thread and step through the code.
– Sometimes you can find the call stack you want by searching through symbols

The problem with these is that stepping through code that runs in milliseconds has taken me over an hour at times, and call stacks often have cryptic names.

This is where tools like PerfView can help.

Erik does an excellent job going through PerfView and introducing people to debugging SQL Server.

Comments closed

Creating Graph Tables in SQL Server

Mala Mahadevan continues a series on graph tables in SQL Server:

I have highlighted in red what SQL Server adds to the table – the two system columns – graph id, which is bigint, and node id, which is nvarchar and stores json, and the unique index to help with queries.

We can also see from constraint type that this table is similar to other relational tables – it can be enabled for replication and can have related delete or update actions defined on it if need be.

This post gives a bit more insight into how graph tables work in SQL Server under the covers.

Comments closed

strace and SQL Server Containers

Anthony Nocentino tries using strace to diagnose SQL Server process activity in a container:

We’re attaching to an already running docker container running SQL. But what we get is an idle SQL Server process this is great if we have a running workload we want to analyze but my goal for all of this is to see how SQL Server starts up and this isn’t going to cut it.
 
My next attempt was to stop the sql19 container and quickly start the strace container but the strace container still missed events at the startup of the sql19 container. So I needed a better way.

Don’t worry—Anthony finds a better way.

Comments closed

Estimates outside the Histogram Range

Josh Darnell shows us how SQL Server calculates estimates for input values outside of the range of your relevant statistic’s histogram:

I have the impression that CSelCalcColumnInInterval “fails” if the predicate doesn’t fall within any of the histogram intervals. The estimation logic then chooses to try the CSelCalcAscendingKeyFiltercalculator (a reference to the “ascending key problem”) if the predicate is specifically higher than the last histogram interval.

Josh includes a couple of demos as well, so check them out.

Comments closed

The Transaction Log in Delta Tables

Burak Yavuz, et al, explain how the transaction log works with Delta Tables in Apache Spark:

When a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory. As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit is written out as a JSON file, starting with 000000.json. Additional changes to the table generate subsequent JSON files in ascending numerical order so that the next commit is written out as 000001.json, the following as 000002.json, and so on.

It’s interesting that they chose JSON instead of a binary transaction log like relational databases use.

Comments closed

Optimize For Sequential Key

Pam Lahoud explains the context behind a new option you can add to indexes in SQL Server 2019 CTP 3.1 and later:

With last page insert contention, as the number of insert threads increases, the queue for the page latch increases which in turn increases latency. Throughput will also decrease, but if something slows down one of the threads that is holding the latch, this can trigger a convoy and throughput suddenly falls off a cliff. This typically happens when a page fills up and a new page must be added to the index (also known as a page split). The insert that triggers the new page will naturally have to hold the latch for longer than normal while the new page operation completes. This causes the queue to build up behind the latch. Adding a new page also requires an exclusive latch on the parent page, which can cause latch requests to queue at that level as well. At this point, throughput falls off a cliff.

OPTIMIZE_FOR_SEQUENTIAL_KEY aims to do two things – control the rate at which new threads are allowed to request the latch, and favor threads that are likely to keep the throughput high. These techniques will not prevent the contention or reduce latency, but they will help keep throughput consistent as concurrency increases.

Read on to learn more about the process and by implication some places where this won’t work well at all for you.

Comments closed

Relating Nonclustered Indexes to the Clustered Index

Erin Stellato takes us through a bit of indexing strategy:

In the sp_SQLskills_helpindex output you’ll notice that OrderLineID is the only column in columns_in_tree, and both OrderLineID and OrderID_and_OrderLineID are in columns_in_leaf.

Remember that when you identify a nonclustered index as unique, it will not add the clustering key to the tree level, it just adds it to the leaf level. But if the nonclustered index is not declared as unique, then SQL Server adds the clustering key to the tree level to uniquely identify the row.

This post deserves a careful reading.

Comments closed

Database Page Allocations Function

Max Vernon takes us through the sys.dm_db_database_page_allocations Dynamic Management Function:

sys.dm_db_database_page_allocations is an undocumented SQL Server T-SQL Dynamic Management Function. This DMF provides details about allocated pages, allocation units, and allocation extents.

Read on for additional details. This is an undocumented function, so it might change between versions but it will give you an idea of how it works under the covers.

Comments closed