SQL Server 2016 and Azure SQL Database have increased the maximum size for index keys with nonclustered indexes. The new maximum key size for nonclustered indexes is 1700 bytes. The maximum key size for clustered indexes remains 900 bytes.
95-99% of the time, we don’t want to get anywhere near 900 bytes (much less 1700 bytes), but we all have that one edge case.
In SQL Server 2014 we have had 18 Extended Events and with Service Pack 1 we have received 1 more to be a total of 19 Extended Events for studying the Columnstore Indexes and the Batch Mode processing. In SQL Server 2016 that number has been greatly increased – there are whooping 61 Extended Events, that will give us an important insight into the Columnstore Indexes.
Even more important, Sunil & his team have given an own category inside the Extended Events – a category that is named Columnstore, which will ease the search for the basic columnstore events. Be aware though not all Extended Events related to Columnstore Indexes are included in that category – even including all channels will give you 41 Extended Events, while hiding the other 20 Extended Events, which are sometimes not categorised at all and at other times are stored under different categories, such as Execution or Error, for example. I believe the reason behind not changing the old Extended Events category is quite simple – Microsoft always looks for avoiding breaking existing applications.
There’s a lot here to digest, so read the whole thing.
A heap is a grouping of unsorted pages which are not linked. Page anatomy is out of scope for this series since all types of indexed and non-indexed tables use the same page structure but I do encourage you to check out here and here to learn more.
A heap is comprised of one or more index allocation map (IAM) pages which point to the data pages which make up the heap. The only exception to this is when you have a row which has been updated and could not fit in its page anymore. In that case, you get a forwarding pointer to the row which has been moved to an existing page with space or a new page. It is possible for you to produce a chain of forwarding records if the row continues to need relocation by further operations.
This is a good introduction to heaps and will feed into later work on how different types of indexes work.
On more than one occasion I have been told that the B-Tree index is the index structure which was designed after the A-Tree index. Another common misconception is that it stands for Binary-Tree. As logical as those may seem, they are false. The ‘B’ in B-Tree does not actually have any specific meaning. Check out Ed McCreight’s explanation here (16:08) where he admits that the name discussion was never settled.
In its most basic form, the B-Tree index is a hierarchy of data pages (page structures lightly touched on in the next post of this series). The lowest level is called the leaf level, the highest level is the index root, and all levels in between are the intermediate levels. This structure is an improvement over the Binary Tree index because its balanced nature greatly improved the performance of maintenance operations such as, INSERT, DELETE, and UPDATE.
On the terminological point, I’d always heard that the “B” stood for “Balanced” because of the level flatness—in contrast to a “normal” tree, you wouldn’t have more than a pre-defined number of levels separation (usually one) between leaf nodes.
So there I was, creating some indexes. Since I was in demo mode, I had execution plans turned on. I’m actually terrible about remembering to turn them off. There have been times when I’ve run loops to create some REALLY big tables, and left them turned on. SSMS crashed pretty quickly. Apparently generating execution plans isn’t free! Keep that in mind when you’re timing/tuning queries.
Since they were there, I took a look at them. They were basically what I expected. The only way to index data is to read it and sort it and blah blah blah. But there was a little something extra!
Yo dawg, I heard you like indexes…
My first encounter with full text indexes and degraded performance was related to an enhancement I made to an aspx page years ago. I wanted all of the search fields to use an AutoComplete AJAX extender to mimic the behavior you see when you type a few letters into the search field on Google.com or Bing.com. A traditional non-clustered index wasn’t sufficient for the “Location Address” field, so I settled on a full text index–it worked very well.
After some amount of time (I don’t remember how long), performance slowed considerably. I was surprised to find the full text index for “Location Address” had a large number of fragments. I wish I had kept some notes on my findings. I can’t remember how may fragments there were, but I’m thinking it was in the 15-20 range. If memory serves me, Orange Co., FL has about 400,000 physical location addresses. The underlying table had one row per location address. Knowing me, the indexed column was probably VARCHAR(100) or VARCHAR(128). This does’t seem like a huge amount of data, so I was surprised the full text searches were slow, even with 15-20 fragments. Reorganizing the related full text catalog made a world of difference. Performance improved drastically.
All indexes need maintenance. Dave has a script to help with full-text indexes.
Note that this script requires SQL Server 2016 (or later) because the database engine team made some great changes to columnstore indexes, allowing us to use REORGANIZE to clear out deleted rows and compact row groups together, as well as its previous job of marking open delta stores as available for compression.
The code is available as a Gist for now, at least until I decide what to do with it. Comments are welcome, especially if I’m missing a major reorganize condition.
As mentioned, comments are welcome.
You receive the error message similar to the following:
Msg 2552, Level 16, State 1, Line 1 The index “blah” (partition 1) on table “blah_blah_blah” cannot be reorganized because page level locking is disabled
Immediately, you start double-checking yourself and verifying that it worked the previous night. You even go so far as to confirm that the same index was previously reorganized. How is it possible that it is failing now on this index. What has changed? Has something changed?
There’s an interesting troubleshooting story, but the important message is about setting up a good set of Extended Events so that you can troubleshoot these types of problems.
So why would you want to do this? Well lets say for example you have a table in a database where the clustered index has become corrupted. Let’s further say that no one mentioned this to you for .. say a year. (No judging!) So your only option at this point might be to use the REPAIR_ALLOW_DATA_LOSS of DBCC CHECKDB. But when you are done how much data has actually been lost? Can you get any of it back?
If you’ve lived a good life and are very lucky, you might recover all data this way. Otherwise, it’s a good idea to run CHECKDB more frequently and check those backups regularly as well.
Sunil Agarwal has a couple of posts on columnstore index defragmentation in SQL Server 2016.
Let us now look at how you can use REORGANIZE command to defragment your columnstore index. Note, this command is only supported for clustered columnstore index (CCI) and nonclustered columnstore index for disk-based tables. In the example below, I create an empty table and then create a clustered columnstore index and finally I load 300k rows. SQL Server 2016 loads data from staging table into CCI in parallel when you specify TABLOCK hint. The machine I ran this test on has 4 logical processors so the 300k rows got divided into 75k each between 4 threads. Since each thread was loading < 102400 rows, the columnstore index ends up with 4 delta rowgroups as shown below.
A compressed rowgroup is considered as fragmented when any of the following two conditions is met
Less than 1 million rows but the trim_reason ( please refer to https://msdn.microsoft.com/en-us/library/dn832030.aspx ) is other than DICTIONARY_SIZE. If the size of a compressed rowgroup is reduced because it has reached the maximum dictionary size, then it can’t be further reduced
It has nonzero deleted rows that exceeds a minimum threshold.
I just got finished with a first draft of a script to determine whether reorganizing a clustered columnstore index partition would be worthwhile, so this is great timing. I hope to make my script available soon, after I incorporate Sunil’s heuristics.