Press "Enter" to skip to content

Category: Internals

When SQL Server Sorts Can Rewind

Paul White turns back the hands of time:

Sorts use storage (memory and perhaps disk if they spill) so they do have a facility capable of storing rows between loop iterations. In particular, the sorted output can, in principle, be replayed (rewound).

Still, the short answer to the title question, “Do Sorts Rewind?” is:

Yes, but you won’t see it very often.

Read the whole thing.

Comments closed

How SQL Server Stores sql_variant Data

Randolph West answers one of the mysteries in life:

By a show of hands, who uses the sql_variant data type? In my experience the answer is “no one,” and that’s not necessarily a bad thing. There’s a general philosophy in our industry where we assume that columns are a specific data type. This predictability makes writing queries and applications against those columns easier. It’s difficult enough dealing with NULL values, so adding confusion with handling data conversions is an invitation to introduce bugs.

The sql_variant data type is a bit of a relic. I think the people who used that were also big variant data type users in Visual Basic. Nonetheless, Randolph clarifies how SQL Server stores this data.

Comments closed

The Merry-Go-Round Scan

David Fowler covers one of the best ways of optimizing frequent scans of large amounts of data:

As we all know, full table scans can be very expensive, poor old SQL is forced to read every single row in a table (of course that doesn’t always mean that it’s a bad choice for SQL).

Lets assume we’ve got a table scan happening that results in 1,000,000 page reads, that’s quite a bit of work for SQL to do. Now imagine another query comes in and needs to scan the same table, that’s also going to need to do 1,000,000 reads to get the data that it needs. If this table happens to be frequently accessed, this is soon going add up.

There’s a clever solution which tends to work better and better as you have more and more queries scanning the table.

Comments closed

Aggregate Splitting in SQL Server 2019

Paul White takes us through a new trick the optimizer has learned:

The extended event query_optimizer_batch_mode_agg_split is provided to track when this new optimization is considered. The description of this event is:

Occurs when the query optimizer detects batch mode aggregation is likely to spill and tries to split it into multiple smaller aggregations.

Other than that, this new feature hasn’t been documented yet. This article is intended to help fill that gap.

Read on as Paul fills that gap.

Comments closed

What MAXDOP Controls

Pedro Lopes gives us an explanation of what MAXDOP really does for us:

There are plenty of blogs on these topics, and the official documentation does a good job of explaining these (in my opinion). If you want to know more about the guidelines and ways to override for specific queries, refer to the Recommendations section in the Configure the max degree of parallelism Server Configuration Option documentation page.

But what does MAXDOP control? A common understanding is that it controls the number of CPUs that can be used by a query – previous revisions of the documentation used this abstraction. And while that is a correct abstraction, it’s not exactly accurate.

This is definitely a nice companion piece to Paul White’s article on how MAXDOP works.

Comments closed

Delayed Prefetch and Hidden Reads

Hugo Kornelis looks at when worlds collide:

So let’s check. The picture above shows, side by side, the properties of the Index Seek and the Key Lookup operator. They show that the Index Seek did 3 logical reads only, while Key Lookup did 650 logical reads. A clear indication where the majority of the work is done.

But wait. Aren’t we missing something?

The SET STATISTICS IO ON output indicates a total of 722 logical reads. The two screenshots above add up to 653 logical reads. Where are the other 69 logical reads?

Read on for the answer.

Comments closed

How SQL Server Stores UNIQUEIDENTIFIERS

Randolph West digs into what a UNIQUEIDENTIFIER looks like in storage:

Let’s take our example GUID again: CC05E271-BACF-4472-901C-957568484405. If we look at the table storage for this row, we’ll find it persisted as follows: 0x71E205CCCFBA7244901C957568484405 (alternating octets are highlighted in bold).

If you haven’t been following this series, this is a good place to remind you that SQL Server stores data using little-endian sequencing on disk and in memory. In the vast majority of cases, bytes are stored in reverse order because that’s how Intel CPUs like their data. However GUIDs are persisted slightly differently because of their sort order.

This is probably the most GUIDs I’ve seen in a single blog post.

Comments closed

The Table Scan Operator

Hugo Kornelis dives into a common operator:

The Table Scan operator is used to read all or most data from a table that has no clustered index (also known as a heap table, or just as a heap). In combination with a Top operator, it can also be used to read just a few rows from a heap table when data order is irrelevant and there is no nonclustered index that covers all required columns.

The basic behavior of a Table Scan operator is very similar to that of the Index Scan operator when it chooses to do an IAM scan, but with a few very important differences. A heap table has no root, intermediate, and leaf level pages; it has data pages only. Each page read from the IAM is a data page and can be processed. But rows on a data page of a heap table can contain forwarding pointers, that cause out of order data access.

I’d say something like “I hope you don’t have too many table scans” because that means a lot of heaps, though given the use of temp tables without clustered indexes, even that statement failed the nuance test.

Comments closed

Exchange Demand Partitioning and Parallel Queries

Joe Obbish takes us through a fun concurrency problem:

Very little has been written about exchange operators with a partitioning type of demand, so I forgive you for not hearing of it before today. There is a brief explanation available here, an example of using demand partitioning to improve some query plans involving partitioned tables, and a Stack Exchange answer for someone comparing round robin and demand partitioning. You have the honor of reading perhaps the fourth blog post about the subject.

Read on for an in-depth look at the problem.

Comments closed

The Architecture of Columnstore Indexes

Ed Pollack has started a series on columnstore indexing:

By storing data grouped by columns, like values can be grouped together and therefore compress very effectively. This compression will often reduce the size of a table by 10x and offers significant improvements over standard SQL Server compression.

For example, if a table with a billion rows has an ID lookup column that has 100 distinct values, then on average each value will be repeated 10 million times. Compressing sequences of the same value is easy and results in a tiny storage footprint.

Just like standard compression, when columnstore data is read into memory, it remains compressed. It is not decompressed until runtime when needed. As a result, less memory is used when processing analytic queries. This allows more data to fit in memory at one time, and the more operations that can be performed in memory, the faster queries can execute.

In scenarios where it makes sense, I absolutely love clustered columnstore indexes.

Comments closed