Press "Enter" to skip to content

Category: Internals

Thinking About Linux Internals

Anthony Nocentino speculates on the internals of SQL Server on Linux:

OK, so everyone wants to know how Microsoft did it…how they got SQL Server running on Linux. In this article, I’m going to try to figure out how.

There’s a couple of approaches they could take…a direct port or some abstraction layer…A direct port would have been hard, basically any OS interaction would have had to been looked at and that would have been time consuming and risk prone. Who comes along to save the day? Abstraction. The word you hear about a million times when you take Operating Systems classes in undergrad and grad computer science courses. 🙂

Anthony talks about picoprocesses, which causes me to say that containers (like Docker) are probably the most important administrative concept of the decade.  If you don’t fundamentally get the concept, learning it opens so many doors.

Comments closed

Preemptive Scheduling

Ewald Cress looks at preemptive scheduling:

Cooperative scheduling is a relay race: you simply don’t stop without passing over the baton. If you write code which reaches a point where it may have to wait to acquire a resource, this waiting behaviour must be implemented by registering your desire with the resource, and then passing over control to a sibling worker. Once the resource becomes available, it or its proxy lets the scheduler know that you aren’t waiting anymore, and in due course a sibling worker (as the outgoing bearer of the scheduler’s soul) will hand the baton back to you.

This is complicated stuff, and not something that just happens by accident. The textbook scenario for such cooperative waiting is the traditional storage engine’s asynchronous disk I/O behaviour, mediated by page latches. Notionally, if a page isn’t in buffer cache, you want to call some form of Read() method on a database file, a method which only returns once the page has been read from disk. The issue is that other useful work could be getting done during this wait.

Read on for a detailed example looking at xp_cmdshell.

Comments closed

Scheduler Stories

Ewald Cress has a couple of posts about the scheduler.  First, fiber mode scheduling:

The title of this post is of course an allusion to Ken Henderson’s classic article The perils of fiber mode, where he hammers home the point that fiber scheduling, a.k.a. lightweight pooling, appears seductive until you realise what you have to give up to use it.

We’ll get to the juicy detail in a moment, but as a reminder, the perils of fibers lie in their promiscuity: many fibers may share one thread, its kernel structures and its thread-local storage. This is no problem for code that was written with fibers in mind, including all of SQLOS, but unfortunately there are bodies of code for which this isn’t true.

Next up is the Windows scheduler:

Hardware interrupts, which run in kernel mode and return to user mode quickly, should be nothing more than tiny hiccups in a running thread’s quantum. The other 90% of the interrupt iceberg manifests in user mode as Deferred Procedure Calls (DPCs, or “bottom halves” to the Linux crowd) but should still only steal small change in terms of CPU cycles. Context switches to another thread represent a completely different story, because it could be ages before control returns to our thread, meaning that our fiber scheduler is completely out of commission for a while.

This possibility – a SQLOS scheduler losing the CPU for an extended period – is just one of those things we need to live with, but on a sane server, it shouldn’t be something to be too concerned about. Consider that this happens all the time in virtualised environments, where our vCPU can essentially cease to exist while another VM has a ride on the physical CPU.

These are fairly long reads, but we’re getting to levels where you can see these settings in the Database Engine (like Lightweight Pooling).

Comments closed

Database Snapshot Creation History

Paul Randal shows how to read the master transaction log to find when database snapshots were created:

Earlier today someone asked on the #sqlhelp Twitter alias if there is a history of database snapshot creation anywhere, apart from scouring the error logs.

There isn’t, unfortunately, but you can dig around the transaction log of the master database to find some information.

When a database snapshot is created, a bunch of entries are made in the system tables in master and they are all logged, under a transaction named DBMgr::CreateSnapshotDatabase. So that’s where we can begin looking.

Click through for the script and some explanation around it.

Comments closed

Phantom Reads

Wayne Sheffield discusses phantom reads:

Run both code scripts again (Code Script 1 first, and Code Script 2 within 10 seconds). This time, you will see that Code Script 2 completes immediately without being blocked, and when Code Script 1 finishes, it has spawned additional data in its result set. A phantom read in action.

If you want to hide from phantom reads completely, then you’ll need to use either the serializable or snapshot transaction isolation levels. Both of these have the same concurrency effects: No dirty reads, non-repeatable reads, or phantom reads. The difference is in how they are implemented: the serializable transaction isolation level will block all other transactions affecting this data, while the snapshot isolation level utilizes row versions to create connection-specific versions of the table for the transaction – all of these row versions will cause increased activity in the tempdb database. Let’s take a look at how the snapshot isolation level will eradicate the phantom reads. First off, we need to modify the database to accept this isolation level.

After reading Wayne’s post, if you want a more academic (i.e., less fun) read, you can also go back to the Microsoft Research isolation levels paper, which describes most of the isolation levels we have in place today for SQL Server.

Comments closed

Page And Key WaitResources For Deadlocks

Kendra Little explains page and key information in deadlock graphs and blocking chains:

1.4) Can I see the data on the page that was locked?

Well, yes. But … do you really need to?

This is slow even on small tables. But it’s kinda fun, so… since you read this far… let’s talk about %%physloc%%!

%%physloc%% is an undocumented piece of magic that will return the physical record locator for every row. You can  use %%physloc%% with sys.fn_PhysLocFormatter in SQL Server 2008 and higher.

This was a very interesting read; check it out.

Comments closed

Scheduler Timing

Ewald Cress continues his look at schedulers:

To simplify things initially, we’ll forget about hidden schedulers and assume hard CPU affinity. That gives us an execution environment that looks like this:

  • Each CPU is physically tied to a scheduler.

  • Therefore, out of all the workers in the system, there is a subset of workers that will only run on that CPU.

  • Workers occasionally hand over control of their CPU to a different worker in their scheduler.

  • At any given moment, each CPU is expected to be running a worker that does something of interest to the middle or upper layers of SQL Server.

  • Some of this useful work will be done on behalf of the worker’s scheduler siblings.

  • However, a (hopefully) tiny percentage of a worker’s time is spent within the act of scheduling.

As usual, this is worth the read.

Comments closed

Understanding Page Splits

Wayne Sheffield goes into detail on page splits:

In considering which of these methods is preferred, we need to consider whether page splits impact these methods – especially nasty page splits. Furthermore, how will index maintenance affect each choice? So let’s think this through.

When there are negative values in this column, and the index is rebuilt, there will be a page with both negative and positive values in it. If the identity column is set to (-1, -1), there won’t be a gap (excluding the 0) in the values, and newly added rows will get a new page allocated – a good page split. If the identity column is set to (-2147483648 , 1), then there will be a full page with the records for the most recently used identity value, and with the values starting with 1 – a rather large gap.

This is worth reading in its entirety.

Comments closed

ReaderWriterSpinlock

Ewald Cress looks at the new ReaderWriterSpinlock in SQL Server 2016 CU2:

As a quick refresher, a traditional SQLOS spinlock is a 32-bit integer, or of course 64-bit as of 2016, with a value of either zero (lock not acquired) or the 32-bit Windows thread ID of the thread that owns it. All very simple and clean in terms of atomic acquire semantics; the only fun part is the exponential backoff tango that results from a collision.

We have also observed how the 2016 flavour of the SOS_RWLock packs a lot of state into 64 bits, allowing more complicated semantics to be implemented in an atomic compare-and-swap. What seems to be politically incorrect to acknowledge is that these semantics boil down to a simplified version of a storage engine latch, who is the unloved and uncool grandpa nowadays.

Clearly a lot can happen in the middle of 64 bits.

Definitely worth a read, as it seems that this is going to get more play in the years to come.

Comments closed