Starting with SQL Server 2016, if you have enough RAM and suffering from the TempDB Spills that do have a significant impact on your workload, then you can enable the Trace Flag 9389 that will enable Batch Mode Iterators to request additional memory for the work and thus avoiding producing additional unnecessary I/O.
I am glad that Microsoft has created this functionality and especially that at the current release, it is hidden behind this track, and so Microsoft can learn from the applications before enabling it by default, hopefully in the next major release of SQL Server.
There’s a lot of good stuff in here. Read the whole thing.
I am extremely proud to share with everyone the news that the long-awaited and quite overdue release of the CISL – Columnstore Indexes Scripts Library is finally public – the 1.3.0 version. The most important part of this release is the support of the SQL Server 2016 & Azure SQLDatabase – all 3 scenarios (Nonclustered Columnstore, Disk-based Clustered Columnstore & the Memory-Optimised Clustered Columnstore Indexes) are included.
You will be able to explore all the important new architecture objects, such as Deleted Buffer & Deleted Table, plus the scripts for every version supports the new output results, even though there were no In-Memory tables in SQL Server 2012 for example.
If you use columnstore indexes, check this out.
SQL Server 2016 requires following conditions to be met for parallel insert on CCI
- Must specify TABLOCK
- No NCI on the clustered columnstore index
- No identity column
- Database compatibility is set to 130
While these restrictions are enforced in SQL Server 2016 but they represent important scenarios. We are looking into relaxing these in subsequent releases. Another interesting point is that you can also load into ‘rowstore HEAP’ in parallel as well.
The restriction I’d most like to see reduced would be the “no non-clustered indexes” part. The rest seem forgivable for most clustered columnstore setups (i.e., fact tables).
Sunil Agarwal has a couple of posts explaining columnstore indexes. First, how columnstore indexes differ from classic B-tree indexes:
Index Fragmentation: For rowstore based indexes, it is considered fragmented if (a) the physical order of pages in out of sync with the index-key order. (b) the data pages (clustered index) or index pages (for nonclustered index) are partially filled. A fragmented index will lead to significantly higher physical IOs and can potentially put more pressure on memory which can ultimately slowdown queries. Most organizations run a periodic index maintenance job to defragment indexes. For details, please refer to https://msdn.microsoft.com/en-us/library/ms189858.aspx#Fragmentation best practices on how to maintain btree indexes. For columnstore index, an index fragmentation is considered fragmented if (a) there are 10% or more rows marked as deleted in a compressed rowgroup (b) one or more smaller compressed rowgroups can be combined to create a larger compressed rowgroup such that the resultant compressed rowgroup has less than or equal to 1 million rows. Note, if a compressed rowgroup has less than 1 million rows due to dictionary size, it is not considered fragmented because there is nothing that can be done to increase its size. Also recall that a columnstore index consists of zero or more delta rowgroups as shown the in the picture below.
SQL Server 2016 provides two flavors of columnstore index; clustered (CCI) and nonclustered (NCCI) columnstore index. As shown in the simplified picture below, both indexes are organized as columns but NCCI is created on an existing rowstore table as shown on the right side in the picture below while a table with CCI does not have a rowstore table. Both tables can have one or more btree nonclustered indexes.
If you haven’t looked at columnstore indexes yet, 2016 is a great time to start.
I will share a little secret with you – it’s all about the Batch Execution Mode in SQL Server 2014: all those Hash Match iterators are running in Batch Mode, even though we are not using Columnstore Index anywhere.
In SQL Server 2016 this old (since 2012) functionality has been removed and once you are running your queries in the compatibility level of 130 (SQL Server 2016), your queries that were taking advantage of it – will be running significantly slower.
There is a fast & brutal solution for that problem – set your compatibility level to 120, but do not go there until you have understood all the implications: some of the most important and magnificent improvements for the Batch Execution Mode are functioning only if your database is set to compatibility level 130: single threaded batch mode, batch sorting, window functions, etc.
From what I know, there is no way you can have all of those functionalities working together under the same hood and enjoy the old way of getting Batch Execution Mode without the presence of the Columnstore Index.
The conclusion is a bit of a downer. Read the whole thing.
Niko Neugebauer talks about handling sequences and default values within columnstore indexes:
There are still no dictionaries – and trying to rebuild this table will not bring any effect at all, but take a look at the size of the segments – their size was lowered for almost 40% to ~1.6 MB!
This technic is very effective if you are compressing the columns that you do access rarely – it should be considered for the log tables for example.
Also notice that Columnstore Archival compression will not bring any significant changes – the original 2.6 MB will lower to 2.42 while the variable char column will not get any further improvements, making the improvement difference around 32%.
Warning: Do NOT use this technic without understanding the consequences – the processing of such columns will lower their effectiveness, since Predicate Pushdown will work in a very limited way, plus the Segment Elimination will not work at all.
At this point you can notice, that from our 3 tables there are only 2 are available for the transactional replication, with the disk-based table with a Clustered Columnstore Index is not being available for the replication.
This means that there are no improvements since SQL Server 2014 for the DWH/BI scenarios in this direction and this is definitely sad.
Well, we can always go a different direction, like in the case of Availability Groups in SQL Server 2014, where readable secondaries are supported only for the Nonclustered Columnstore Indexes. In SQL Server 2016 we can use Nonclustered Columnstore Index even on all columns if needed and get the principle improvements for the Batch Execution Mode.
Notice here that even though we can select the InMemory tables with Clustered Columnstore, there are a couple of additional important settings that needs to be configured to make things function. So clicking through the GUI Wizard will not set things correctly up by default.
The short answer, Niko states, is that you can only replicate non-clustered columnstore indexes at this time. I can see some use for replicating clustered columnstore tables (warehouse scale-out scenarios, perhaps), but it wasn’t at the top of my columnstore improvement list.
I stumbled upon this MSDN blog post: SQL Server 2016 SSIS Data Flow Buffer Auto Sizing capability benefits data loading on Clustered Columnstore tables (catchy title). It explains how you can set the buffer properties of the data flow to try to insert data directly into compressed row groups instead of in the delta store. They fail to achieve this using SSIS 2014 and then they explain how using the new AutoAdjustBufferSize property of SSIS 2016 works miracles and everything is loaded directly into compressed row groups. Hint: you want to avoid loading data into the delta store, as it is row storage and you need to wait for the tuple mover to load the data to the CCI in the background.
However, it’s still possible to achieve the same using SSIS 2014 (or earlier). Niko Neugebauer (blog |twitter) shows this in his post Clustered Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memory”). It still depends on the estimated row size, but using these settings you should get better results:
This advice is a bit different from loading standard rowstore-based tables, but serves to pack as many rows into each columnstore row group as possible.
In SQL Server 2014 we have had 18 Extended Events and with Service Pack 1 we have received 1 more to be a total of 19 Extended Events for studying the Columnstore Indexes and the Batch Mode processing. In SQL Server 2016 that number has been greatly increased – there are whooping 61 Extended Events, that will give us an important insight into the Columnstore Indexes.
Even more important, Sunil & his team have given an own category inside the Extended Events – a category that is named Columnstore, which will ease the search for the basic columnstore events. Be aware though not all Extended Events related to Columnstore Indexes are included in that category – even including all channels will give you 41 Extended Events, while hiding the other 20 Extended Events, which are sometimes not categorised at all and at other times are stored under different categories, such as Execution or Error, for example. I believe the reason behind not changing the old Extended Events category is quite simple – Microsoft always looks for avoiding breaking existing applications.
There’s a lot here to digest, so read the whole thing.
There is a new line in the properties of the iterator, showing the number of locally aggregated rows and that number equals 619255, that should be exactly the number of rows that is missing from the arrow connecting 2 iterators:
1 select 12008353 + 619255
Gives us our perfect 12627608 rows.
Is there any more information on this operation?
Indeed, just right-click on the Columnstore Index Scan and select it’s properties:
This is tied to some columnstore performance improvements in SQL Server 2016.