Filtered Indexes For Uniqueness

Shane O’Neill answers one of my favorite interview questions:

I used to think that this would be a complex requirement, possibly requiring aTRIGGER or two to check the inserted value against whatever is already there; but there is a way to have this functionality and have it the way that SQL Server normally would enforce a uniqueness on a column; by using a UNIQUE INDEX.

In case you’re thinking…

“Oh, a unique index doesn’t check what’s already there, is that it?”

I’m afraid that’s not the case.

This is one of my favorite uses of filtered indexes:  “limited” uniqueness.  In other words, I’m okay with an unlimited number of NULL values but all non-NULL values need to be unique.

Clustered Indexes

Derik Hammer looks at the power of clustered indexes:

The data in a clustered index is logically sorted but does not guarantee that it will be physically sorted. The physical sorting is simply a common misconception. In fact, the rows on a given page are not sorted even though all rows contained on that page will be appropriate to its place in the logical sort order. Also, the pages on disk are not guaranteed to be sorted by the logical key either.

The most likely time where you will have a clustered index that is physically sorted is immediately after an index rebuild operation. If you are trying to optimize for sequential reads, setting a fill factor to leave free space on your pages will help limit how often you have pages physically out of order at the expense of disk space.

Derik also discusses four qualities for a good clustered index.  My preferred acronym is NUSE (Narrow, Unique, Static, Ever-increasing); Derik uses slightly different terms.

The Secret Lives Of Seeks

Rob Farley digs into what happens with a seek operation:

Let’s go back to our original query, looking for address types 2, 4, and 5, (which returns 2 rows) and think about what’s going on inside the seek.

I’m going to assume the Query Engine has already done the work to figure out that the Index Seek is the right operation, and that it has the page number of the index root handy.

At this point, it loads that page into memory, if it’s not already there. That’s the first read that gets counted in the execution of the seek. Then it locates the page number for the row it’s looking for, and reads that page in. That’s the second read.

But we often gloss over that ‘locates the page number’ bit.

The upshot is rather interesting:  in certain edge cases, an uglier query can be better than an easier-to-understand query.  If you do this, however, you definitely want to document it; otherwise, you’ll leave the next maintainer (which could be you!) confused.

Finding Unused Indexes

SQLWayne has a script to help find unused indexes:

Here’s some code that can show you what indexes are unused or empty.  An empty index just means that there’s no data in that table right now, it may always be populated later, so I would not drop an empty index.  Besides, how much space would an empty index take?

For my personal preferences, I order the output by table then index name, also I put a u.* at the end of the select statement so the more interesting usage stat columns can be seen.

If an index truly is unused, it’s a waste of resources.  The problem is, sometimes you’ll think an index is unused but it’s really a vital part of month-end reporting or used for the CEO’s favorite dashboard.

Included Columns

Kendra Little has a new video involving indexing:

Recently, I was thinking about nonclustered indexes in SQL Server, and how included columns are stored. Is SQL Server smart enough to optimize the storage for small indexes with includes? Find out in this free seven minute video.

It’s a short video, well worth your time.

Script Those Indexes

Kendra Little provides a T-SQL script to script out all indexes on a database:

Sometimes you need to script out all the indexes in a database.

Maybe you’re concerned something has changed since they were last checked in.

Or maybe the indexes aren’t checked into source control, and you’re working on fixing that. (Important!)

Either way, sometimes you need to do it, and it’s not fun through the GUI. I needed to write some fresh demo code for this recently, and I needed it to give the details for partitioned tables using data compression, and I thought I’d share.

The fact that the built-in Generate Scripts does not include compression is annoying, but Kendra’s script does.  For bonus points, use Powershell to update scripts automatically with index changes and check them into your source control system of choice.

Increased NCI Key Size

Jos de Bruijn notes that non-clustered index keys may be larger in SQL Server 2016 and Azure SQL Database:

SQL Server 2016 and Azure SQL Database have increased the maximum size for index keys with nonclustered indexes. The new maximum key size for nonclustered indexes is 1700 bytes. The maximum key size for clustered indexes remains 900 bytes.

95-99% of the time, we don’t want to get anywhere near 900 bytes (much less 1700 bytes), but we all have that one edge case.

New Columnstore Extended Events

Niko Neugebauer talks about extended events relating to columnstore indexes in SQL Server 2016:

In SQL Server 2014 we have had 18 Extended Events and with Service Pack 1 we have received 1 more to be a total of 19 Extended Events for studying the Columnstore Indexes and the Batch Mode processing. In SQL Server 2016 that number has been greatly increased – there are whooping 61 Extended Events, that will give us an important insight into the Columnstore Indexes.

Even more important, Sunil & his team have given an own category inside the Extended Events – a category that is named Columnstore, which will ease the search for the basic columnstore events. Be aware though not all Extended Events related to Columnstore Indexes are included in that category – even including all channels will give you 41 Extended Events, while hiding the other 20 Extended Events, which are sometimes not categorised at all and at other times are stored under different categories, such as Execution or Error, for example. I believe the reason behind not changing the old Extended Events category is quite simple – Microsoft always looks for avoiding breaking existing applications.

There’s a lot here to digest, so read the whole thing.

Heaps

Derik Hammer walks us through heap tables as part of his indexing series:

A heap is a grouping of unsorted pages which are not linked. Page anatomy is out of scope for this series since all types of indexed and non-indexed tables use the same page structure but I do encourage you to check out here and here to learn more.

A heap is comprised of one or more index allocation map (IAM) pages which point to the data pages which make up the heap. The only exception to this is when you have a row which has been updated and could not fit in its page anymore. In that case, you get a forwarding pointer to the row which has been moved to an existing page with space or a new page. It is possible for you to produce a chain of forwarding records if the row continues to need relocation by further operations.

This is a good introduction to heaps and will feed into later work on how different types of indexes work.

B-Tree Indexes

Derik Hammer is starting a new series on index types and has started with the venerable B-tree:

On more than one occasion I have been told that the B-Tree index is the index structure which was designed after the A-Tree index. Another common misconception is that it stands for Binary-Tree. As logical as those may seem, they are false. The ‘B’ in B-Tree does not actually have any specific meaning. Check out Ed McCreight’s explanation here (16:08) where he admits that the name discussion was never settled.

In its most basic form, the B-Tree index is a hierarchy of data pages (page structures lightly touched on in the next post of this series). The lowest level is called the leaf level, the highest level is the index root, and all levels in between are the intermediate levels. This structure is an improvement over the Binary Tree index because its balanced nature greatly improved the performance of maintenance operations such as, INSERT, DELETE, and UPDATE.

On the terminological point, I’d always heard that the “B” stood for “Balanced” because of the level flatness—in contrast to a “normal” tree, you wouldn’t have more than a pre-defined number of levels separation (usually one) between leaf nodes.

Categories

May 2017
MTWTFSS
« Apr  
1234567
891011121314
15161718192021
22232425262728
293031