Press "Enter" to skip to content

Category: Internals

Diving Into the Window Spool Operator

Hugo Kornelis continues a series on execution plan operators:

The Window Spool operator is one of the four spool operators that SQL Server supports. Like other spool operators, it retains a copy of data it receives and can then return those rows as often as needed. The specific functionality of the Window Spool operator allows it to replay rows within a window, as defined in a ROWS or RANGE specification of an OVER clause.

Read on to see how these work, as well as a few differences from their spool brethren.

Comments closed

Making Use of Sort Rewinds: Closest Match

Paul White follows up on an article:

In When Do SQL Server Sorts Rewind? I described how most sorts can only rewind when they contain at most one row. The exception is in-memory sorts, which can rewind at most 500 rows and 16KB of data.

These are certainly tight restrictions, but we can still make use of them on occasion.

To illustrate, I am going reuse a demo Itzik Ben-Gan provided in part one of his Closest Match series, specifically solution 2 (modified value range and indexing).

Click through for the explanation.

Comments closed

When SQL Server Sorts Can Rewind

Paul White turns back the hands of time:

Sorts use storage (memory and perhaps disk if they spill) so they do have a facility capable of storing rows between loop iterations. In particular, the sorted output can, in principle, be replayed (rewound).

Still, the short answer to the title question, “Do Sorts Rewind?” is:

Yes, but you won’t see it very often.

Read the whole thing.

Comments closed

How SQL Server Stores sql_variant Data

Randolph West answers one of the mysteries in life:

By a show of hands, who uses the sql_variant data type? In my experience the answer is “no one,” and that’s not necessarily a bad thing. There’s a general philosophy in our industry where we assume that columns are a specific data type. This predictability makes writing queries and applications against those columns easier. It’s difficult enough dealing with NULL values, so adding confusion with handling data conversions is an invitation to introduce bugs.

The sql_variant data type is a bit of a relic. I think the people who used that were also big variant data type users in Visual Basic. Nonetheless, Randolph clarifies how SQL Server stores this data.

Comments closed

The Merry-Go-Round Scan

David Fowler covers one of the best ways of optimizing frequent scans of large amounts of data:

As we all know, full table scans can be very expensive, poor old SQL is forced to read every single row in a table (of course that doesn’t always mean that it’s a bad choice for SQL).

Lets assume we’ve got a table scan happening that results in 1,000,000 page reads, that’s quite a bit of work for SQL to do. Now imagine another query comes in and needs to scan the same table, that’s also going to need to do 1,000,000 reads to get the data that it needs. If this table happens to be frequently accessed, this is soon going add up.

There’s a clever solution which tends to work better and better as you have more and more queries scanning the table.

Comments closed

Aggregate Splitting in SQL Server 2019

Paul White takes us through a new trick the optimizer has learned:

The extended event query_optimizer_batch_mode_agg_split is provided to track when this new optimization is considered. The description of this event is:

Occurs when the query optimizer detects batch mode aggregation is likely to spill and tries to split it into multiple smaller aggregations.

Other than that, this new feature hasn’t been documented yet. This article is intended to help fill that gap.

Read on as Paul fills that gap.

Comments closed

What MAXDOP Controls

Pedro Lopes gives us an explanation of what MAXDOP really does for us:

There are plenty of blogs on these topics, and the official documentation does a good job of explaining these (in my opinion). If you want to know more about the guidelines and ways to override for specific queries, refer to the Recommendations section in the Configure the max degree of parallelism Server Configuration Option documentation page.

But what does MAXDOP control? A common understanding is that it controls the number of CPUs that can be used by a query – previous revisions of the documentation used this abstraction. And while that is a correct abstraction, it’s not exactly accurate.

This is definitely a nice companion piece to Paul White’s article on how MAXDOP works.

Comments closed

Delayed Prefetch and Hidden Reads

Hugo Kornelis looks at when worlds collide:

So let’s check. The picture above shows, side by side, the properties of the Index Seek and the Key Lookup operator. They show that the Index Seek did 3 logical reads only, while Key Lookup did 650 logical reads. A clear indication where the majority of the work is done.

But wait. Aren’t we missing something?

The SET STATISTICS IO ON output indicates a total of 722 logical reads. The two screenshots above add up to 653 logical reads. Where are the other 69 logical reads?

Read on for the answer.

Comments closed

How SQL Server Stores UNIQUEIDENTIFIERS

Randolph West digs into what a UNIQUEIDENTIFIER looks like in storage:

Let’s take our example GUID again: CC05E271-BACF-4472-901C-957568484405. If we look at the table storage for this row, we’ll find it persisted as follows: 0x71E205CCCFBA7244901C957568484405 (alternating octets are highlighted in bold).

If you haven’t been following this series, this is a good place to remind you that SQL Server stores data using little-endian sequencing on disk and in memory. In the vast majority of cases, bytes are stored in reverse order because that’s how Intel CPUs like their data. However GUIDs are persisted slightly differently because of their sort order.

This is probably the most GUIDs I’ve seen in a single blog post.

Comments closed

The Table Scan Operator

Hugo Kornelis dives into a common operator:

The Table Scan operator is used to read all or most data from a table that has no clustered index (also known as a heap table, or just as a heap). In combination with a Top operator, it can also be used to read just a few rows from a heap table when data order is irrelevant and there is no nonclustered index that covers all required columns.

The basic behavior of a Table Scan operator is very similar to that of the Index Scan operator when it chooses to do an IAM scan, but with a few very important differences. A heap table has no root, intermediate, and leaf level pages; it has data pages only. Each page read from the IAM is a data page and can be processed. But rows on a data page of a heap table can contain forwarding pointers, that cause out of order data access.

I’d say something like “I hope you don’t have too many table scans” because that means a lot of heaps, though given the use of temp tables without clustered indexes, even that statement failed the nuance test.

Comments closed