Press "Enter" to skip to content

Category: Internals

The Architecture of Columnstore Indexes

Ed Pollack has started a series on columnstore indexing:

By storing data grouped by columns, like values can be grouped together and therefore compress very effectively. This compression will often reduce the size of a table by 10x and offers significant improvements over standard SQL Server compression.

For example, if a table with a billion rows has an ID lookup column that has 100 distinct values, then on average each value will be repeated 10 million times. Compressing sequences of the same value is easy and results in a tiny storage footprint.

Just like standard compression, when columnstore data is read into memory, it remains compressed. It is not decompressed until runtime when needed. As a result, less memory is used when processing analytic queries. This allows more data to fit in memory at one time, and the more operations that can be performed in memory, the faster queries can execute.

In scenarios where it makes sense, I absolutely love clustered columnstore indexes.

Comments closed

Understanding Heaps in SQL Server

Uwe Ricken has a series on the much-maligned heap:

This article is the beginning of a series of articles about Heaps in Microsoft SQL Server. Heaps are rejected by many database developers using Microsoft SQL Server. The concerns about Heaps are even fuelled by Microsoft itself by generally recommending the use of clustered indexes for every table. Globally renowned SQL Server experts also generally advise that tables in Microsoft SQL Server be provided with a clustered index.

Again, and again, I try to convince developers that a heap can even have advantages. I have discussed many pros and cons with these people and would now like to break a “PRO HEAP” lance. This article deals with the basics. Important system objects that play a major role in Heaps are only superficially presented in this article and described in detail in a follow up article.

I’m generally in the anti-heap camp, but I can acknowledge that there are situations in which heaps are better—I save my dogmatism for other things, like hating pie charts and loving representations of things as event streams.

Comments closed

Understanding RID Lookups

Hugo Kornelis takes us through an operator I usually don’t want to see:

The RID Lookup operator offers the same logical functionality within the execution plan as the Key Lookup operator. But where Key Lookup is used for tables that have a clustered index, RID Lookup is instead used when a table is “heap” (table without clustered index). It is used when another operator (usually an Index Seek, sometimes an Index Scan, rarely a combination of two or more of these or other operators) is used to find rows that need to be processed, but the index used does not include all columns needed for the query. The RID Lookup operator is then used to fetch the remaining columns from the heap structure where the table data is stored.

Click through for a great deal of information about RID Lookups.

Comments closed

Internal Storage of Numeric Values

Randolph West continues a series on how SQL Server stores values:

As we know from before, integers are whole numbers, or numbers with no fractions (i.e. no decimal places). This is going to be in the test later, so pay attention. In other words, the numbers 0 through 9 are integers, but a floating point or decimal / numeric value is not an integer. As soon as you add decimal places, it stops being an integer even if the fraction equates to zero.

Inside the storage engine, integers are mostly signed values (they can have negative values), and each integer data type has a fixed size. The exception is TINYINT which only has positive values. Like many other data types, integer types are stored byte-reversed (known as little-endian).

Click through for some good information from Randolph.

Comments closed

Thread Pool Exhaustion and Availability Groups

Sean Gallardy lays down the gauntlet:

You’re probably wondering why you couldn’t spawn a new thread, why this error happened, why you shouldn’t just up the max worker threads, and probably also thinking this must be some kind of “bug” in SQL Server. So here’s where our awkward conversation starts… It’s you. Contrary to every relationship breakup you’ve ever had, it’s definitely you. I’m not saying this to be mean but to really drive the point home here. The major reasons for this occurring are large blocking chains, too much workload for the server size (databases, users, etc.), and/or your virtual infrastructure sucks. There aren’t too many reasons for getting yourself into this situation, and while what I’ll be putting forth here isn’t exhaustive of all edge cases and scenarios, these are by far the majority of all the items in the wild that I’ve either worked on or have been involved in at some level. Side Note: If you’ve read this far, are shaking your head, calling me names that an irate sailor might utter, and telling yourself that upping the max worker threads as the product error suggests and Microsoft should fix their bugs then you can stop reading here as you’re probably not open to learning why you have issues in your environments.

One more scenario I’ve seen is mirroring thousands of databases on a single instance. That scenario fit none of Sean’s criteria—there was very little blocking, most of the databases were small and infrequently-used, and the infrastructure was the right size. It was just a huge number of databases and each database requiring a minimum of X worker threads. Mind you, it was still a bad idea…

Comments closed

Understanding Memory Grants

Taiob Ali walks us through the concept of memory grants:

DesiredMemory: Memory estimated to fit intermediate results in KB for the chosen degree of parallelism. If the query runs in serial mode, this is the same as SerialDesiredMemory.

(Amount needed to store all temporary rows in memory. This depends on the cardinality estimate, expected number rows and average size of row). This is called additional because a query can survive lack of such memory by storing part of temporary rows on hard disk. A query is not guaranteed to have the full amount if the total exceeds the preset limit.)

Read on for explanations of each of the elements in MemoryGrantInfo.

Comments closed

Change Tracking and Internal Tables

Tim Weigel continues a series on change tracking:

In my last post, I showed you how to configure change tracking at the table level and how to get configuration information about change tracking from the database engine. We looked at sys.change_tracking_databases and sys.change_tracking_tables, and looked at some sample scripts that present the information in a more readable format.

Before moving on to working with change tracking, I’d like to show you a little bit about how SQL Server handles change tracking data under the hood. Let’s take a few minutes to talk about sys.internal_tablessys.dm_tran_commit_table, and sys.syscommittab. These aren’t objects that most DBAs interact with on a routine basis, but they’re useful for understanding how change tracking does what it does.

Click through to learn more about these internal tables.

Comments closed

Date and Time Storage in SQL Server

Randolph West covers the internals of how date and time data types are stored in SQL Server:

DATE is the byte-reversed number of days since the year 0001-01-01, stored as three bytes. It goes up to 9999-12-31, which is stored as 0xDAB937. You can check this value by reversing the bytes and sticking them into a hex calculator. 37 B9 DA equals 3,652,058, which is the number of days since 0001-01-01.

If you try to cast 0xDBB937 as a DATE value (by incrementing the least significant bit DA by 1), it will throw a conversion error. There is obviously some overflow detection that protects against corruption in a date type.

Randolph looks at DATE, TIME, DATETIME(2), and DATETIME and explains how each is storedon a page.

Comments closed

Understanding Key Lookups

Hugo Kornelis continues a series on SQL Server plan operators:

The Key Lookup operator provides a subset of the functionality of the Clustered Index Seek operator, but within a specific context. It is used when another operator (usually an Index Seek, sometimes an Index Scan, rarely a combination of two or more of these or other operators) is used to find rows that need to be processed, but the index used does not include all columns needed for the query. The Key Lookup operator is then used to fetch the remaining columns from the clustered index.

A Key Lookup operator will always be found on the inner input of a Nested Loops operator. It will be executed once for each row found. Since the key values passed in always come from another index, the requested row will always exist (except in rare race scenarios when read uncommitted isolation level is used).

Click through for a great deal of information on key lookups.

Comments closed