Category: Internals

Adaptive Join Internals

Published 2017-05-15 by Kevin Feasel

Dmitry Pilugin digs deep into how adaptive joins work in SQL Server 2017:

We have three types of physical join algorithms in SQL Server: hash, nested loops and merge. Adaptive join allows SQL Server automatically choose an actual physical algorithm on the fly between the first two – hash (HM) and nested loops (NL).

NL has two join strategies – naive nested loops join (inner loop scans the whole inner table or index) and index nested loops join (index on the join column of the inner table is used to find necessary rows and then those rows are applied to the outer row, also called Nested Loops Apply). Typically, the second one performs very well if you have rather small input on the outer side and indexed rather big input on the inner side.

HM is more universal and uses hash algorithms to match rows, so no indexes are necessary. You may refer to my blog post Hash Join Execution Internals for more details.

Adaptive Join starts execution as a Hash Join. It consumes all the input of the build phase and looks at the adaptive join threshold, if the number of rows is more or equal this threshold it will continue as a hash join. However, if the number of rows is less than this threshold, it will switch to a NL.

If you want to get a better understanding of how adaptive joins works, Dmitry’s post is a great start.

Comments closed

SOS_UnfairMutexPair

Published 2017-05-08 by Kevin Feasel

Ewald Cress talks about SOS_UnfairMutexPair:

The focal point of the mutex’s state – in fact one might say the mutex itself – is the single Spinlock bit within the 32-bit lock member. Anybody who finds it zero, and manages to set it to one atomically, becomes the owner.

Additionally, if you express an interest in acquiring the lock, you need to increment the WaiterCount, whether or not you managed to achieve ownership at the first try. Finally, to release the lock, atomically set the spinlock to zero and decrement the WaiterCount; if the resultant WaiterCount is nonzero, wake up all waiters.

Now one hallmark of a light-footed synchronisation object is that it is very cheap to acquire in the non-contended case, and this class checks that box. If not owned, taking ownership (the method SOS_UnfairMutexPair::AcquirePair()) requires just a handful of instructions, and no looping. The synchronisation doesn’t get in the way until it is needed.

However, if the lock is currently owned, we enter a more complicated world within the SOS_UnfairMutexPair::LongWait() method.

I love the statement that “This is not a very British class at all.” Read the whole thing.

Comments closed

Default Column Storage

Published 2017-04-28 by Kevin Feasel

Paul Randal explains how default column values are stored:

And selecting the initial 10 rows can be demonstrated to return the 3rd column using the initial default set in step 3. (It makes no difference if any rows are added between steps 3 and 4.)

This means that there *must* be two default values stored when a new column is added: one for the set of already-existing rows that don’t have the new column and one for any new rows. Initially these two default values will be the same, but the one for new rows can change (e.g. in steps 4 and 5 above) with breaking the old rows. This works because after the new column is added (step 3 above), it’s impossible to add any more rows that *don’t* have the new column.

And this is exactly how it works. Let’s investigate!

In typical Paul Randal fashion, this is both a look at internals and an interesting explanation.

Comments closed

Secret Mysteries Of SQL Server 2017

Published 2017-04-24 by Kevin Feasel

Erik Darling goes spelunking for new and modified internal objects in SQL Server 2017:

sys.dm_db_missing_index_group_stats

sys.dm_db_missing_index_group_stats
What have we here?

Of particular interest are last_sql_handle, query_hash, and query_plan_hash. It appears that we’ll finally be able to easily tie missing index requests to their queries, without doing a lot of painful XML processing. I had planned on adding something like this, but couldn’t find a good fit between 1) adding XML processing to sp_BlitzIndex, or adding more DMV queries and rather unpleasant XML processing to sp_BlitzCache. This will make implementing it far easier, assuming it works the way it looks like it will work.

Erik has three examples of interesting additions in CTP 2.0.

Comments closed

Trying Out Interleaved Execution

Published 2017-04-20 by Kevin Feasel

Dmitry Pilugin has a test of interleaved execution of a multi-statement table-valued function:

First of all, note that Estimated Number of Rows is 10 000 now, which is correct and equals Actual Number of Rows. Due to the correct estimate the optimizer decided that there are enough rows to benefit from a partial (local/global) aggregation and introduced a partial Hash aggregate before the join.

If you take a Profiler, enable events SP:StmtStarting, SP:StmtCompleted, SQL:StmtStarting, SQL:StmtCompleted and run the query without and with a TF, you’ll see what does it actually mean “interleaved” in terms of the execution sequence.

During the regular execution the query starts executing, then the function is executed, the query continues execution and finishes the execution. We see the following event sequence:

This is a very thorough post, but if you have multi-statement TVFs, you absolutely want to read it.

Comments closed

Adaptive Query Processing In CTP 2.0

Published 2017-04-20 by Kevin Feasel

Joe Sack has a couple blog posts on adaptive query processing enhancements in SQL Server 2017 CTP 2.0. First, Batch Mode Adaptive Joins:

We have seen numerous cases where providing a specific join hint solved query performance issues for our customers. However, the drawback of adding a hint is that we remove join algorithm decisions from the optimizer for that statement. While fixing a short-term issue, the hard-coded hint may not be the optimal decision as data distributions shift over time.

Another scenario is where we do not know up front what the optimal join should be, for example, with a parameter sensitive query where a low or high number of rows may flow through the plan based on the actual parameter value.

With these scenarios in mind, the Query Processing team introduced the ability to sense a bad join choice in a plan and then dynamically switch to a better join strategy during execution.

That one’s really cool. Joe also talks about interleaved execution for multi-statement TVFs:

SQL Server has historically used a unidirectional “pipeline” for optimizing and executing queries. During optimization, the cardinality estimation process is responsible for providing row count estimates for operators in order to derive estimated costs. The estimated costs help determine which plan gets selected for use in execution. If cardinality estimates are incorrect, we will still end up using the original plan despite the poor original assumptions.

Interleaved execution changes the unidirectional boundary between the optimization and execution phases for a single-query execution and enables plans to adapt based on the revised estimates. During optimization if we encounter a candidate for interleaved execution, which for this first version will be multi-statement table valued functions (MSTVFs), we will pause optimization, execute the applicable subtree, capture accurate cardinality estimates and then resume optimization for downstream operations.

The goal here is to make Table-Valued Functions viable from a performance perspective. The team has a long way to go there, but they’re working on it. Also, Joe gives a shout out to Arun Sirpal, frequent curatee.

Comments closed

TempDB Spill And Checkpoint Writes

Published 2017-04-06 by Kevin Feasel

Lonny Niederstadt shows that the -k startup option throttles two separate things:

The -k startup option can throttle checkpoint writes, and can throttle tempdb spills.
On my systems, I’ve never seen an overwhelming checkpoint but I’ve seen plenty of overwhelming spills to tempdb. But are spill writes through the checkpoint mechanism? If so, then -k would just be throttling checkpoint writes to persistent databases and to tempdb – same function in two contexts.

Let’s take a look. I’ll look at the same test scenarios I used in my March 29, 2017 blog post: an “insert into #temptable select…”, a “select… into #temptable”, and an index create with sort_in_tempdb.

For further reference, here is his preliminary research into the -k option.

Comments closed

Doubly Linked Lists And Bugs

Published 2017-04-06 by Kevin Feasel

Ewald Cress explains a bug in DBCC PAGE:

Let’s start with the safety convention. The “null” of a null pointer isn’t a magic value, but in real-life implementation is simply zero, which is a perfectly valid virtual address. However, on the premise that trying to access address zero or addresses near it probably indicates a program error, the OS will map that page in such a way that trying to access it causes an access violation. This is not a bug or an accident, but a damn clever feature! Robert Love explains it very nicely over here for Linux, and it applies equally to Windows.

Now recall the convention that trying to retrieve the head or tail of an empty list will – by convention – bring you back a null pointer. When iterating, a related convention may also return a zero when you’ve gone all the way around and come back to the list head. Clearly the onus is on the developer to recognise that null pointer and not dereference it, but attempting to do so sets in motion the safety feature of an access violation, which can then be neatly caught through standard exception handling, for instance yielding a diagnostic stack dump.

Very interesting article, and also a good juxtaposition of supported, “production-safe” code versus undocumented processes.

Comments closed

Understanding DBCC TEC

Published 2017-03-31 by Kevin Feasel

Ewald Cress explains (but does not document!) an undocumented DBCC command:

Boring old disclaimer: What I am describing here is undocumented, unsupported, likely to change between versions, and will probably make you go blind. In fact, the depth of detail exposed illustrates one reason why Microsoft would not want to document it: if end users of SQL Server found a way to start relying on this not changing, it would hamstring ongoing SQL Server improvement and refactoring.

With that out of the way, let’s dive right into DBCC TEC, a command which can dump a significant chunk of the object tree supporting a SQL Server session. This result is the same thing that shows up within a dump file, namely the output of the CSession::Dump() function – it’s just that you can invoke this through DBCC without taking a dump (cue staring match with Kendra). Until corrected, I shall imagine that TEC stands for Thread Execution Context.

I appreciate Ewald’s ability to make sense out of the madness of database internals.

Comments closed

Recovery Without Write-Ahead Logging

Published 2017-03-24 by Kevin Feasel

Kendra Little thinks about scenarios in which SQL Server is not able to do write-ahead logging (like with memory-optimized objects):

A few episodes ago, I talked about how learning about Write Ahead Logging was a light bulb moment for me, and helped me learn tons of concepts about backups and recovery. This week, we talk about when SQL Server turns things upside down and doesn’t use write ahead logging: and what it has to do for recovery in these special cases.

Watch this week’s 24 minute video. Subscribe to my YouTube channel, or check out the audio podcast to listen anywhere, anytime. Links from this episode are below the video.

Click through for the video.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31