Let’s start with the safety convention. The “null” of a null pointer isn’t a magic value, but in real-life implementation is simply zero, which is a perfectly valid virtual address. However, on the premise that trying to access address zero or addresses near it probably indicates a program error, the OS will map that page in such a way that trying to access it causes an access violation. This is not a bug or an accident, but a damn clever feature! Robert Love explains it very nicely over here for Linux, and it applies equally to Windows.
Now recall the convention that trying to retrieve the head or tail of an empty list will – by convention – bring you back a null pointer. When iterating, a related convention may also return a zero when you’ve gone all the way around and come back to the list head. Clearly the onus is on the developer to recognise that null pointer and not dereference it, but attempting to do so sets in motion the safety feature of an access violation, which can then be neatly caught through standard exception handling, for instance yielding a diagnostic stack dump.
Very interesting article, and also a good juxtaposition of supported, “production-safe” code versus undocumented processes.
Boring old disclaimer: What I am describing here is undocumented, unsupported, likely to change between versions, and will probably make you go blind. In fact, the depth of detail exposed illustrates one reason why Microsoft would not want to document it: if end users of SQL Server found a way to start relying on this not changing, it would hamstring ongoing SQL Server improvement and refactoring.
With that out of the way, let’s dive right into DBCC TEC, a command which can dump a significant chunk of the object tree supporting a SQL Server session. This result is the same thing that shows up within a dump file, namely the output of the CSession::Dump() function – it’s just that you can invoke this through DBCC without taking a dump (cue staring match with Kendra). Until corrected, I shall imagine that TEC stands for Thread Execution Context.
I appreciate Ewald’s ability to make sense out of the madness of database internals.
A few episodes ago, I talked about how learning about Write Ahead Logging was a light bulb moment for me, and helped me learn tons of concepts about backups and recovery. This week, we talk about when SQL Server turns things upside down and doesn’t use write ahead logging: and what it has to do for recovery in these special cases.
Click through for the video.
So far, this is all rather abstract, and we’re just seeing numbers which may or may not be pointers, pointing to heaven knows what. Let me finish off by illuminating one trail to something we can relate to.
Out of the five local storage slots which contain something, the first one here points to 00000000`3b656040. As it turns out, this is an instance of the CCompExecCtxtBasic class, and you’ll just have to take my word for it today. Anyway, we’re hunting it for its meat, rather than for its name. Have some:
Click through for details, including a graphic.
Notionally we can imagine a global portfolio of active memory allocations, each chunk uniquely identified by its starting address. When we want memory, we ask the global memory manager to lend us some from the unused pool, and when we’re done with it, we hand it back to that memory manager, who carefully locks its internal structures during such operations, because we should only access mutable global state in a single-threaded manner, and…. Oops. No, no, double no. That is not how SQL Server does things, right?
Okay, we know that there are actually a variety of memory allocators out there. If nothing else, this avoid the single bottleneck problem. But now the question becomes one of knowing which allocator to return a chunk of memory to after we’re done with it.
As usual, this is a deep and interesting blog post from Ewald.
The first and simpler GetData() overload doesn’t show up in a vftable, but the second does. Oddly, the vftable for the second one lives at an offset of +0x1448 into the class instance – you’re going to have to trust me on this one. So the rcx passed into either variation will actually be the same one! But if the virtual version is called, it needs to find its position relative to +0x1448 dynamically, by doing a data lookup. We can confirm that by peeking at what is saved four bytes earlier at +0x1444, and that is indeed the value zero.
Ewald explains how this is vital to multiple inheritance and this post is only guaranteed to make your brain hurt a little bit.
PFS pages occur every 8088 pages in every data file and store a byte of information about itself and the following 8087 pages. The most important piece of information it stores is whether a page is allocated (in use) or not. You can read more about PFS pages and the other per-database allocation bitmaps in this blog post.
So why can’t they be repaired by DBCC CHECKDB, when all the other per-database allocation bitmaps can?
The answer is that the is-this-page-allocated-or-not information is not duplicated anywhere else in the database, and it’s impossible to reconstruct it in all cases.
In case you’re not particularly familiar with PFS pages, Paul has a blog post from 2006 describing GAM, SGAM, and PFS pages.
The rdi register is used as a scratchpad to save the incoming rcx value for later – first it is used to restore the possibly clobered rcx value for the second child function call, and then it becomes the return value going into rax. However, since we don’t want to clobber its value for our caller, we first save its old value and then, at the very end, restore it. This is done through a traditional push/pop pair, which in x64 is only allowed in the function prologue and epilogue.
Similarly, ebx (the bottom half of rbx) is used to save the incoming value of the second parameter in edx. However, here we don’t use a push/pop pair, but instead save it in a slot in the shadow space. Why two different mechanisms for two different registers? Sorry, that’s one for the compiler writers to answer.
There’s a lot of detail involved here, but it’s well worth the read.
In several of my last few blog posts, I’ve shared several methods of getting internal information from a database by using the DBCC PAGE command and utilizing the “WITH TABLERESULTS” option to be allowed to automate this process for further processing. This post will also do this, but in this case, we’ll be using it to bust a common myth—data in a clustered index is physically stored on disk in the order of the clustered index.
Busting this myth
To bust this myth, we’ll create a database, put a table with a clustered index into this database, and then we’ll add some rows in random order. Next, we will show that the rows are stored on the pages in logical order, and then we’ll take a deeper look at the page internals to see that the rows are not stored in physical order.
Read on for the proof.
Per-processor partitioning of certain thread management functions makes perfect sense, since we’d aim to minimise the amount of global state. Thus each processor would have its own dispatcher state, its own timer list… And hang on, this is familiar territory we know from SQLOS! The only difference is that SQLOS operates on the premise of virtualising a CPU in the form of a Scheduler, whereas the OS kernel deals with physical CPUs, or at least what it honestly believes to be physical CPUs even in the face of CPU virtualisation.
This is a start to a very interesting series.