Press "Enter" to skip to content

Category: Indexing

Surrogate Versus Natural Keys

Kenneth Fisher digs into the debate on surrogate keys versus natural keys:

A natural key is one constructed of data that already exists in the table. For example using latitude and longitude in a table of addresses. Or the social security number in a table of employees. (Before you say anything, yes, the social security number is a horrible primary key. Be patient.)

My personal preference is to use surrogate keys most of the time and put unique constraints (or unique indexes) on the natural key.  There are some occasions in which I’d deviate, but ceteris paribus I’d pick this strategy..

Comments closed

Dynamic Index Generation

Brent Ozar generates 999 indexes:

The CHARACTER_MAXIMUM_LENGTH <> -1 OR IS NULL stuff is because I don’t want to try to index NVARCHAR(MAX) fields, but I do want to index NVARCHAR(50) fields.

The ORDER BY is because I wanted to get a wide variety of leading fields. If I just ordered by c1.COLUMN_NAME, I wouldn’t get all of the first fields covered in the first 999 indexes. (And I’m not even using a large table.)

Sometimes I think I’ve worked on systems which used this script to build indexes.  But then I read the index names:  “dta.”  And it all makes sense.

Comments closed

Spatial Fragmentation Viewer

Slava Murygin writes a spatial query which shows database fragmentation:

As you can see, I definitely have a lot of free space, but my data are so spread across the file and especially up to it’s border, that there is no way to make file size smaller.

If we zoom at the very tail we can figure out the names of tables at the very end of the file, which prevent file from shrinking:

This looks quite a bit like the old Windows 95 defrag tool.  I like it.

Comments closed

Thinking About Index Design

Jeremiah Peschka looks at a scenario in which a heap might be superior to a clustered index:

In this case, we have to assume that Event IDs may be coming from anywhere and, as such, may not arrive in order. Even though we’re largely appending to the table, we may not be appending in a strict order. Using a clustered index to support the table isn’t the best option in this case – data will be inserted somewhat randomly. We’ll spend maintenance cycles defragmenting this data.

Another downside to this approach is that data is largely queried by Owner ID. These aren’t unique, and one Owner IDcould have many events or only a few events. To support our querying pattern we need to create a multi-column clustering key or create an index to support querying patterns.

This result is not intuitive to me, and I recommend reading the whole thing.

Comments closed

Forwarded Records

Tara Kizer looks at forwarded records on heaps:

Forwarded records are rows in a heap that have been moved from the original page to a new page, leaving behind a forwarding record pointer on the original page to point at the new page. This occurs when you update a column that increases the size of the column and can no longer fit on the page. UPDATEs can cause forwarded records if the updated data does not fit on the page. Forwarding pointers are used to keep track of where the data is.

The comments are also worth reading.  Except for the terrible puns.

Comments closed

Indexes And Stats

Brent Ozar looks at a case when adding a suggested index monkeys with stats:

The query runs faster, make no mistake – but check out the estimates:

  • Estimated number of rows = 1
  • Actual number of rows = 165,367

Those estimates are built by SQL Server’s cardinality estimator (CE), and there have been major changes to it over the last couple of versions. You can control which CE you’re using by changing the database’s compatibility level. This particular StackOverflow database is running in 2016 compat mode – so what happens if we switch it back to 2012 compat mode?

Based on this result, there might be further optimizations available.  Read on for more of Brent’s thoughts.

Comments closed

Indexes On Disk

Kendra Little has a great diagram showing which indexes are disk-based and which are memory-resident:

I was looking through some terms in SQL Server documentation the other day, thinking about what it’s like to learn about SQL Server’s indexes when you’re new to the field. I jotted down a note: B-tree = Rowstore = Disk Based

And then I realized that’s not quite right.

Not all disk based indexes are traditional clustered and nonclustered indexes. Columnstore indexes are also disk based. Updatable Columnstore indexes use special rowstore B-trees behind the scenes. And Books Online says “rowstore” also refers to Memory-Optimized tables.

If you’re new to indexing, this picture will save you some learning time.

Comments closed

Filtered Indexes For Uniqueness

Shane O’Neill answers one of my favorite interview questions:

I used to think that this would be a complex requirement, possibly requiring aTRIGGER or two to check the inserted value against whatever is already there; but there is a way to have this functionality and have it the way that SQL Server normally would enforce a uniqueness on a column; by using a UNIQUE INDEX.

In case you’re thinking…

“Oh, a unique index doesn’t check what’s already there, is that it?”

I’m afraid that’s not the case.

This is one of my favorite uses of filtered indexes:  “limited” uniqueness.  In other words, I’m okay with an unlimited number of NULL values but all non-NULL values need to be unique.

Comments closed

Clustered Indexes

Derik Hammer looks at the power of clustered indexes:

The data in a clustered index is logically sorted but does not guarantee that it will be physically sorted. The physical sorting is simply a common misconception. In fact, the rows on a given page are not sorted even though all rows contained on that page will be appropriate to its place in the logical sort order. Also, the pages on disk are not guaranteed to be sorted by the logical key either.

The most likely time where you will have a clustered index that is physically sorted is immediately after an index rebuild operation. If you are trying to optimize for sequential reads, setting a fill factor to leave free space on your pages will help limit how often you have pages physically out of order at the expense of disk space.

Derik also discusses four qualities for a good clustered index.  My preferred acronym is NUSE (Narrow, Unique, Static, Ever-increasing); Derik uses slightly different terms.

Comments closed

The Secret Lives Of Seeks

Rob Farley digs into what happens with a seek operation:

Let’s go back to our original query, looking for address types 2, 4, and 5, (which returns 2 rows) and think about what’s going on inside the seek.

I’m going to assume the Query Engine has already done the work to figure out that the Index Seek is the right operation, and that it has the page number of the index root handy.

At this point, it loads that page into memory, if it’s not already there. That’s the first read that gets counted in the execution of the seek. Then it locates the page number for the row it’s looking for, and reads that page in. That’s the second read.

But we often gloss over that ‘locates the page number’ bit.

The upshot is rather interesting:  in certain edge cases, an uglier query can be better than an easier-to-understand query.  If you do this, however, you definitely want to document it; otherwise, you’ll leave the next maintainer (which could be you!) confused.

Comments closed