SQL Server 2016 requires following conditions to be met for parallel insert on CCI
- Must specify TABLOCK
- No NCI on the clustered columnstore index
- No identity column
- Database compatibility is set to 130
While these restrictions are enforced in SQL Server 2016 but they represent important scenarios. We are looking into relaxing these in subsequent releases. Another interesting point is that you can also load into ‘rowstore HEAP’ in parallel as well.
The restriction I’d most like to see reduced would be the “no non-clustered indexes” part. The rest seem forgivable for most clustered columnstore setups (i.e., fact tables).
Sunil Agarwal has a couple of posts explaining columnstore indexes. First, how columnstore indexes differ from classic B-tree indexes:
Index Fragmentation: For rowstore based indexes, it is considered fragmented if (a) the physical order of pages in out of sync with the index-key order. (b) the data pages (clustered index) or index pages (for nonclustered index) are partially filled. A fragmented index will lead to significantly higher physical IOs and can potentially put more pressure on memory which can ultimately slowdown queries. Most organizations run a periodic index maintenance job to defragment indexes. For details, please refer to https://msdn.microsoft.com/en-us/library/ms189858.aspx#Fragmentation best practices on how to maintain btree indexes. For columnstore index, an index fragmentation is considered fragmented if (a) there are 10% or more rows marked as deleted in a compressed rowgroup (b) one or more smaller compressed rowgroups can be combined to create a larger compressed rowgroup such that the resultant compressed rowgroup has less than or equal to 1 million rows. Note, if a compressed rowgroup has less than 1 million rows due to dictionary size, it is not considered fragmented because there is nothing that can be done to increase its size. Also recall that a columnstore index consists of zero or more delta rowgroups as shown the in the picture below.
SQL Server 2016 provides two flavors of columnstore index; clustered (CCI) and nonclustered (NCCI) columnstore index. As shown in the simplified picture below, both indexes are organized as columns but NCCI is created on an existing rowstore table as shown on the right side in the picture below while a table with CCI does not have a rowstore table. Both tables can have one or more btree nonclustered indexes.
If you haven’t looked at columnstore indexes yet, 2016 is a great time to start.
I will share a little secret with you – it’s all about the Batch Execution Mode in SQL Server 2014: all those Hash Match iterators are running in Batch Mode, even though we are not using Columnstore Index anywhere.
In SQL Server 2016 this old (since 2012) functionality has been removed and once you are running your queries in the compatibility level of 130 (SQL Server 2016), your queries that were taking advantage of it – will be running significantly slower.
There is a fast & brutal solution for that problem – set your compatibility level to 120, but do not go there until you have understood all the implications: some of the most important and magnificent improvements for the Batch Execution Mode are functioning only if your database is set to compatibility level 130: single threaded batch mode, batch sorting, window functions, etc.
From what I know, there is no way you can have all of those functionalities working together under the same hood and enjoy the old way of getting Batch Execution Mode without the presence of the Columnstore Index.
The conclusion is a bit of a downer. Read the whole thing.
Niko Neugebauer talks about handling sequences and default values within columnstore indexes:
There are still no dictionaries – and trying to rebuild this table will not bring any effect at all, but take a look at the size of the segments – their size was lowered for almost 40% to ~1.6 MB!
This technic is very effective if you are compressing the columns that you do access rarely – it should be considered for the log tables for example.
Also notice that Columnstore Archival compression will not bring any significant changes – the original 2.6 MB will lower to 2.42 while the variable char column will not get any further improvements, making the improvement difference around 32%.
Warning: Do NOT use this technic without understanding the consequences – the processing of such columns will lower their effectiveness, since Predicate Pushdown will work in a very limited way, plus the Segment Elimination will not work at all.
At this point you can notice, that from our 3 tables there are only 2 are available for the transactional replication, with the disk-based table with a Clustered Columnstore Index is not being available for the replication.
This means that there are no improvements since SQL Server 2014 for the DWH/BI scenarios in this direction and this is definitely sad.
Well, we can always go a different direction, like in the case of Availability Groups in SQL Server 2014, where readable secondaries are supported only for the Nonclustered Columnstore Indexes. In SQL Server 2016 we can use Nonclustered Columnstore Index even on all columns if needed and get the principle improvements for the Batch Execution Mode.
Notice here that even though we can select the InMemory tables with Clustered Columnstore, there are a couple of additional important settings that needs to be configured to make things function. So clicking through the GUI Wizard will not set things correctly up by default.
The short answer, Niko states, is that you can only replicate non-clustered columnstore indexes at this time. I can see some use for replicating clustered columnstore tables (warehouse scale-out scenarios, perhaps), but it wasn’t at the top of my columnstore improvement list.
I stumbled upon this MSDN blog post: SQL Server 2016 SSIS Data Flow Buffer Auto Sizing capability benefits data loading on Clustered Columnstore tables (catchy title). It explains how you can set the buffer properties of the data flow to try to insert data directly into compressed row groups instead of in the delta store. They fail to achieve this using SSIS 2014 and then they explain how using the new AutoAdjustBufferSize property of SSIS 2016 works miracles and everything is loaded directly into compressed row groups. Hint: you want to avoid loading data into the delta store, as it is row storage and you need to wait for the tuple mover to load the data to the CCI in the background.
However, it’s still possible to achieve the same using SSIS 2014 (or earlier). Niko Neugebauer (blog |twitter) shows this in his post Clustered Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memory”). It still depends on the estimated row size, but using these settings you should get better results:
This advice is a bit different from loading standard rowstore-based tables, but serves to pack as many rows into each columnstore row group as possible.
In SQL Server 2014 we have had 18 Extended Events and with Service Pack 1 we have received 1 more to be a total of 19 Extended Events for studying the Columnstore Indexes and the Batch Mode processing. In SQL Server 2016 that number has been greatly increased – there are whooping 61 Extended Events, that will give us an important insight into the Columnstore Indexes.
Even more important, Sunil & his team have given an own category inside the Extended Events – a category that is named Columnstore, which will ease the search for the basic columnstore events. Be aware though not all Extended Events related to Columnstore Indexes are included in that category – even including all channels will give you 41 Extended Events, while hiding the other 20 Extended Events, which are sometimes not categorised at all and at other times are stored under different categories, such as Execution or Error, for example. I believe the reason behind not changing the old Extended Events category is quite simple – Microsoft always looks for avoiding breaking existing applications.
There’s a lot here to digest, so read the whole thing.
There is a new line in the properties of the iterator, showing the number of locally aggregated rows and that number equals 619255, that should be exactly the number of rows that is missing from the arrow connecting 2 iterators:
1 select 12008353 + 619255
Gives us our perfect 12627608 rows.
Is there any more information on this operation?
Indeed, just right-click on the Columnstore Index Scan and select it’s properties:
This is tied to some columnstore performance improvements in SQL Server 2016.
Note that this script requires SQL Server 2016 (or later) because the database engine team made some great changes to columnstore indexes, allowing us to use REORGANIZE to clear out deleted rows and compact row groups together, as well as its previous job of marking open delta stores as available for compression.
The code is available as a Gist for now, at least until I decide what to do with it. Comments are welcome, especially if I’m missing a major reorganize condition.
As mentioned, comments are welcome.
I decided to make a serious step back and write about something that is concerning the current (SQL Server 2014) and the elder version of SQL Server that supports Nonclustered Columnstore Indexes – (SQL Server 2012).
The Nonclustered Columnstore Indexes in SQL Server 2012 & 2014 are non-updatable, meaning that after they are built on the table, you cannot modify the table anymore – you can only read the data from it.
The common solutions for this problem are:
– Using Partitioning
– Disabling Columnstore, modifying the data and Rebuilding the Columnstore Index then (thus activating it)
Sounds easy, doesn’t it ?
Well, like with everything in the real life, there are a couple of quite important gotchas here.
The “non-updatable” part is why I ignored non-clustered columnstore indexes. With SQL Server 2016, I’m going to take another look at them. But if you’re living on 2012 or 2014 for a while, this is a good post to give you an idea of how to load those tables.