Note that this script requires SQL Server 2016 (or later) because the database engine team made some great changes to columnstore indexes, allowing us to use REORGANIZE to clear out deleted rows and compact row groups together, as well as its previous job of marking open delta stores as available for compression.
The code is available as a Gist for now, at least until I decide what to do with it. Comments are welcome, especially if I’m missing a major reorganize condition.
As mentioned, comments are welcome.
I decided to make a serious step back and write about something that is concerning the current (SQL Server 2014) and the elder version of SQL Server that supports Nonclustered Columnstore Indexes – (SQL Server 2012).
The Nonclustered Columnstore Indexes in SQL Server 2012 & 2014 are non-updatable, meaning that after they are built on the table, you cannot modify the table anymore – you can only read the data from it.
The common solutions for this problem are:
– Using Partitioning
– Disabling Columnstore, modifying the data and Rebuilding the Columnstore Index then (thus activating it)
Sounds easy, doesn’t it ?
Well, like with everything in the real life, there are a couple of quite important gotchas here.
The “non-updatable” part is why I ignored non-clustered columnstore indexes. With SQL Server 2016, I’m going to take another look at them. But if you’re living on 2012 or 2014 for a while, this is a good post to give you an idea of how to load those tables.
Sunil Agarwal has a couple of posts on columnstore index defragmentation in SQL Server 2016.
Let us now look at how you can use REORGANIZE command to defragment your columnstore index. Note, this command is only supported for clustered columnstore index (CCI) and nonclustered columnstore index for disk-based tables. In the example below, I create an empty table and then create a clustered columnstore index and finally I load 300k rows. SQL Server 2016 loads data from staging table into CCI in parallel when you specify TABLOCK hint. The machine I ran this test on has 4 logical processors so the 300k rows got divided into 75k each between 4 threads. Since each thread was loading < 102400 rows, the columnstore index ends up with 4 delta rowgroups as shown below.
A compressed rowgroup is considered as fragmented when any of the following two conditions is met
Less than 1 million rows but the trim_reason ( please refer to https://msdn.microsoft.com/en-us/library/dn832030.aspx ) is other than DICTIONARY_SIZE. If the size of a compressed rowgroup is reduced because it has reached the maximum dictionary size, then it can’t be further reduced
It has nonzero deleted rows that exceeds a minimum threshold.
I just got finished with a first draft of a script to determine whether reorganizing a clustered columnstore index partition would be worthwhile, so this is great timing. I hope to make my script available soon, after I incorporate Sunil’s heuristics.
There is a very usable support for Columnstore Indexes within the temporary objects, but they are not appearing in any of the DMV’s to be analysed or optimised. This is especially sad in the relation to the global temporary tables which are some of the more useful temporary objects.
For the most part, I’d consider these reasonable results. Hopefully we can get columnstore stats on temp tables, but even that’s not a huge loss.
Do you need to be concerned about that a delta rowgroup is scanned single threaded? The answer is NO for two reasons (a) most columnstore indexes have very few delta rowgroups (b) if you have multiple delta rowgroups, they can be scanned in parallel with one thread per delta rowgroup
I have a beef with (a), at least for SQL Server 2014, but that’s a story for another day.
Recall that on rowstore tables (i.e. the tables organized as rows not as columnstore), SQL Server requires you to specify TABLOCK for parallel bulk import to get minimal logging and locking optimizations. One key difference for tables with clustered columnstore index is that you don’t need TABLOCK for getting locking/logging optimizations for bulk import. The reasons for this difference in behavior is that each bulk import thread can load data exclusively into a columnstore rowgroup. If the batch size < 102400, then the data is imported into a delta rowgroup otherwise a new compressed rowgroup is created and the data is loaded into it. Let us take two following interesting cases to show this bulk import behavior. Assume you are importing 4 data files, each with one bulk import thread, concurrently into a table with clustered columnstore index
The “don’t use TABLOCK” is interesting in comparison to rowstore tables.
To solve the performance problem I went straight to the DefaultBufferMaxRows setting and set it to be equal of the maximum number of rows in a Row Group – 1048576. Together with the AutoAdjustBufferSize setting it helps the actual current size of the DataFlow Buffer that will be used for transferring the data from the source to the destination table.
What should I say – it worked like magic:
I guess that with 2:09 Minutes the clear winner of this test is the configuration with AutoAdjustBufferSize set to True and the DefaultBufferMaxRows to 1048576. It took less then a half of the time with just AutoAdjustBufferSize activated and the insertion process was executed with the help of the Bulk Load API – meaning that we did not have to wait for the Tuple Mover or to execute it manually.
Doubling insertion performance is nothing to scoff, especially for something like columnstore tables, where we expect millions (or more) of rows to be inserted.
In SQL Server 2016 the OLTP Systems have received a significant improvement – support for the Columnstore Indexes (disk-based Nonclustered Columnstore & In-memory based Clustered Columnstore).
In both cases we have as the base the underlying OLTP-style table, with a Delta-Store object (or Tail Row Group for InMemory tables), that will hold the new data being inserted or updated by the final users. The data that is being frequently updated in OLTP-style systems is called Hot Data. The data that just being inserted into your table is definitely a Hot Data.
The important moment for the table is when the data becomes Cold or mostly infrequently read-accessed, and meaning that it can be compressed into Columnstore format.
This does seem interesting and can be very helpful in using columnstore indexes across different data patterns.
Since then, I’ve wondered if ColumnStore indexes (both clustered and non-clustered) might help any of these scenarios. TL;DR: Based on this experiment in isolation, the answer to the title of this post is a resounding NO. If you don’t want to see the test setup, code, execution plans, or graphs, feel free to skip to my summary, keeping in mind that my analysis is based on a very specific use case.
I actually would have been surprised to find the answer here to be “yes.” Columnstore is designed with aggregation in mind, rather than pulling out a fairly small subset of the data.
There you have it; our recommendation is to choose a batchsize of > 102400 to get benefits of minimal logging with clustered columnstore index. In the next blog, I will discuss parallel bulk import and locking optimizations.
My experience is that you really want to insert in large batches.
Niko Neugebauer has two new posts up on columnstore index changes with SQL Server 2016.
Row Group merging & cleanup is a very long waited improvement that came out in SQL Server 2016. Once Microsoft has announced this functionality, everyone who has worked with SQL Server 2014 & Clustered Columnstore Indexes has rejoiced – one of the major problems with logical fragmentation because of the deleted data is solved! Amazing!
Just as a reminder – logical fragmentation is the process when we mark obsolete data in the Deleted Bitmap (in Columnstore Indexes there is no direct data removal from the compressed Segments with Delete command and Update command uses Deleted Bitmap as well marking old versions of rows as deleted).
Stretch DB or alternatively Stretch Database is a way of spreading your table between SQL Server (on-premises, VM in Azure) and a Azure SQLDatabase. This means that the dat of the table will shared between the SQL Server and the Azure SQLDatabase giving the opportunity to lower the total cost of the local storage, since Azure SQLDatabase is cheap relatively expensive storage typically used on the local SQL Server installations.
This mean that the table data will be separated intoHot Data & Cold Data, where Hot Data is the type of data that is frequently accessed and it extremely important (this is typically some OLTP data) and the Cold Data (this is typically rarely or almost never accessed archival or log data).
For the final user the experience should be the same as before – should he ask for some data that is not on the SQL Server, then it will be read from Azure SQLDatabase by the invocation of remote query, joined with the local results (if any) and then presented to the user.
These two posts are must-reads if you work with columnstore indexes.