Note that this script requires SQL Server 2016 (or later) because the database engine team made some great changes to columnstore indexes, allowing us to use REORGANIZE to clear out deleted rows and compact row groups together, as well as its previous job of marking open delta stores as available for compression.
The code is available as a Gist for now, at least until I decide what to do with it. Comments are welcome, especially if I’m missing a major reorganize condition.
As mentioned, comments are welcome.
You receive the error message similar to the following:
Msg 2552, Level 16, State 1, Line 1 The index “blah” (partition 1) on table “blah_blah_blah” cannot be reorganized because page level locking is disabled
Immediately, you start double-checking yourself and verifying that it worked the previous night. You even go so far as to confirm that the same index was previously reorganized. How is it possible that it is failing now on this index. What has changed? Has something changed?
There’s an interesting troubleshooting story, but the important message is about setting up a good set of Extended Events so that you can troubleshoot these types of problems.
So why would you want to do this? Well lets say for example you have a table in a database where the clustered index has become corrupted. Let’s further say that no one mentioned this to you for .. say a year. (No judging!) So your only option at this point might be to use the REPAIR_ALLOW_DATA_LOSS of DBCC CHECKDB. But when you are done how much data has actually been lost? Can you get any of it back?
If you’ve lived a good life and are very lucky, you might recover all data this way. Otherwise, it’s a good idea to run CHECKDB more frequently and check those backups regularly as well.
Sunil Agarwal has a couple of posts on columnstore index defragmentation in SQL Server 2016.
Let us now look at how you can use REORGANIZE command to defragment your columnstore index. Note, this command is only supported for clustered columnstore index (CCI) and nonclustered columnstore index for disk-based tables. In the example below, I create an empty table and then create a clustered columnstore index and finally I load 300k rows. SQL Server 2016 loads data from staging table into CCI in parallel when you specify TABLOCK hint. The machine I ran this test on has 4 logical processors so the 300k rows got divided into 75k each between 4 threads. Since each thread was loading < 102400 rows, the columnstore index ends up with 4 delta rowgroups as shown below.
A compressed rowgroup is considered as fragmented when any of the following two conditions is met
Less than 1 million rows but the trim_reason ( please refer to https://msdn.microsoft.com/en-us/library/dn832030.aspx ) is other than DICTIONARY_SIZE. If the size of a compressed rowgroup is reduced because it has reached the maximum dictionary size, then it can’t be further reduced
It has nonzero deleted rows that exceeds a minimum threshold.
I just got finished with a first draft of a script to determine whether reorganizing a clustered columnstore index partition would be worthwhile, so this is great timing. I hope to make my script available soon, after I incorporate Sunil’s heuristics.
Testing SQL Server 2016 RC0 today, I see that the bug for index_usage_stats has been fixed in this release! When I generate scans, seeks, and updates against an index, running ALTER INDEX REBUILD no longer resets the information. I can still tell which indexes have been used and which have not since the database came online, just like we had in SQL Server 2008 R2 and prior.
Kendra has created a Connect item to fix a separate bug where rebuilding an index clears out the missing index recommendations for that table.
But if we hover over that index seek, we can see in the tooltip that there’s a hidden predicate that is NOT a seek predicate. This is a hidden filter. And because this is SQL Server 2016, we can see “Number of Rows Read” — it had to read 9.3 million rows to count 1.9 million rows. It didn’t realize the 2006 rows were together– it checked all the females and examined the FakeBirthDateStamp column for each row.
Built-in functions can cause SARGability issues.
NC_TABLE1 is 36 total extents. 288 eight k pages. 2.25 mb. It can be read in 5 reads – one read for each contiguous run.CI_TABLE1 is comprised of 48 extents. 3 mb. It can be read in 11 reads – again, one for each contiguous run.The SQL Server instance has the -E startup option enabled. Without that startup option, proportional fill algorithm would distribute incoming data to the 8 data files with a potential smallest allocation of a single 64k extent before rotating round-robin to the additional files in the filegroup. With the -E startup option, the smallest possible new allocation by proportional fill is sixty four 64k extents – 4 mb.That means if I can rebuild the indexes into new space at the tail of the files, the contiguity should be improved considerably.
I had never considered that the scenario described here before, so this was definitely interesting.
In SQL Server 2016 the OLTP Systems have received a significant improvement – support for the Columnstore Indexes (disk-based Nonclustered Columnstore & In-memory based Clustered Columnstore).
In both cases we have as the base the underlying OLTP-style table, with a Delta-Store object (or Tail Row Group for InMemory tables), that will hold the new data being inserted or updated by the final users. The data that is being frequently updated in OLTP-style systems is called Hot Data. The data that just being inserted into your table is definitely a Hot Data.
The important moment for the table is when the data becomes Cold or mostly infrequently read-accessed, and meaning that it can be compressed into Columnstore format.
This does seem interesting and can be very helpful in using columnstore indexes across different data patterns.
Remember that this is an application problem and is not a SQL problem. We only get into trouble when applications (or people) expect results to be sorted when they’re not. So unless you have a tiny application, or a huge amount of discipline, it’s likely that there is some part of your application that assumes sorted results when it shouldn’t.
Here’s a method I used that attempts to identify such areas, exposing those assumptions. It involves reversing indexes.
It’s an interesting idea to try out in a dev environment.
We have had an index job that has been failing for a while. This is one of those things you really don’t want to clean up because no one is complaining, but you know you should. I had heard that I could rebuild one partition at a time, but where to start? Today, I worked my way through it, so here it is so that you can do it too.
First you need to find the biggest indexes, there is a good chance those are the ones that live on partitions. I am removing Primary Keys.
This can be a real time-saver if a majority of your partitions either are read-only or at least rarely update.