Minimal Logging With Columnstore

Niko Neugebauer continues his columnstore series by looking at columnstore insert logging in SQL Server 2016 versus 2014:

Ladies and gentlemen! That’s quite a difference to SQL Server 2014!
We better check the total length of the transaction log to see the final result: 384.032 bytes! Ok, that is significantly more than for the rowstore heap table for sure, but what about the comparison to the SQL Server 2014 ? Did this minimal logging bring any improvement ?
Well … 🙂
In SQL Server 2014 we had 1.255.224 bytes spent on the transaction log – meaning over 1.2 MB, meaning around 3 times more, for the Delta-Store insertion! For such a simple table, this is a huge improvement, but let’s take a look at the total length of the transaction log entries in both environments (SQL Server 2014 & SQL Server 2016)

This is worth a careful read.  If you’ve spent time working with 2014 clustered columnstore indexes, there are a few changes which might affect you.  The most interesting thing for me was that the deltastore is no longer page compressed.

Nonclustered Columnstore Indexes On Indexed Views

Niko Neugebauer notes that non-clustered columnstore indexes can now sit on top of indexed views, as of SQL Server 2016:

From the perspective of the disk access, this is where you will definitely win at least a couple of times with the amount of the disk access while processing the information, amount of memory that you will need to store and process (think hashing and sorting for the late materialisation phases), and you will pay less for the occupied storage.

Another noticeable thing was that the memory grants for the Indexed Views query was smaller compared to the query that was processing the original columnstore table FactOnlineSales.

Clustered indexes are currently not available as an option; we’ll see if that changes in the next version of SQL Server.

Columnstore Improvements

Warner Chaves discusses improvements in columnstore indexes in SQL Server 2016:

SQL Server first introduced Columnar Storage with the SQL 2012 Enterprise release. In this release, Columnstores were read-only indexes, so it required to drop the index, load the table or partition and then rebuild the index to refresh it with the latest data.

SQL Server 2014 upgraded Columnstores with full read-write capabilities, allowing the Columnstore to become the ‘clustered’ index for the table and hold all the data instead of just being one more index on top of row-organized data. 2014 also introduced many improvements to batch operations so more pieces of an execution plan could take advantage of this faster processing mode.

Read on to see changes in 2016.

CISL 1.3.1

Niko Neugebauer has released the newest version of his columnstore index library:

Here is the small description of what is new in this release:

  • The database snapshots (.dacpac) for all platforms are now included in the Releases\DacPacs.

  • Includes new Powershell functions for installing and removing CISL from the Instances:

    • Install-CISL.ps1 will allow you to install the CISL at multiple databases of a SQ Server Instance (or Azure SQLDB).
    • Remove-CISL.ps1 will allow you to remove the CISL from the multiple databases of a SQL Server Instance (or Azure SQLDB).
  • Support for the different collation is included.

  • Includes information on all recent SQL Server updates.

  • Included support of the new Columnstore Indexes Trace Flags in SQL Server 2016.

  • Basic Unit Tests (based on t-sqlt) are included for SQL Server 2012 & SQL Server 2014, guaranteeing the quality of the released code.

  • A good number of bug fixes.

  • Further parameter enhancements for the existing functions.

Sounds like there’s a lot packed into this release.

New Columnstore Trace Flags

Niko Neugebauer looks at a few trace flags which modify columnstore index behavior:

Starting with SQL Server 2016, if you have enough RAM and suffering from the TempDB Spills that do have a significant impact on your workload, then you can enable the Trace Flag 9389 that will enable Batch Mode Iterators to request additional memory for the work and thus avoiding producing additional unnecessary I/O.

I am glad that Microsoft has created this functionality and especially that at the current release, it is hidden behind this track, and so Microsoft can learn from the applications before enabling it by default, hopefully in the next major release of SQL Server.

There’s a lot of good stuff in here.  Read the whole thing.

CISL 1.3.0

Niko Neugebauer has released version 1.3.0 of his Columstore script library:

I am extremely proud to share with everyone the news that the long-awaited and quite overdue release of the CISL – Columnstore Indexes Scripts Library is finally public – the 1.3.0 version. The most important part of this release is the support of the SQL Server 2016 & Azure SQLDatabase – all 3 scenarios (Nonclustered Columnstore, Disk-based Clustered Columnstore & the Memory-Optimised Clustered Columnstore Indexes) are included.
You will be able to explore all the important new architecture objects, such as Deleted Buffer & Deleted Table, plus the scripts for every version supports the new output results, even though there were no In-Memory tables in SQL Server 2012 for example.

If you use columnstore indexes, check this out.

Parallel Loading Columnstore Indexes

Sunil Agarwal shows how to bulk load with parallelism into a clustered columnstore index from a staging table:

SQL Server 2016 requires following conditions to be met for parallel insert on CCI

  • Must specify TABLOCK
  • No NCI on the clustered columnstore index
  • No identity column
  • Database compatibility is set to 130

While these restrictions are enforced in SQL Server 2016 but they represent important scenarios. We are looking into relaxing these in subsequent releases. Another interesting point is that you can also load into ‘rowstore HEAP’ in parallel as well.

The restriction I’d most like to see reduced would be the “no non-clustered indexes” part.  The rest seem forgivable for most clustered columnstore setups (i.e., fact tables).

Columnstore Basics

Sunil Agarwal has a couple of posts explaining columnstore indexes.  First, how columnstore indexes differ from classic B-tree indexes:

  • Index Fragmentation: For rowstore based indexes, it is considered fragmented if (a) the physical order of pages in out of sync with the index-key order. (b) the data pages (clustered index) or index pages (for nonclustered index) are partially filled. A fragmented index will lead to significantly higher physical IOs and can potentially put more pressure on memory which can ultimately slowdown queries. Most organizations run a periodic index maintenance job to defragment indexes. For details, please refer to best practices on how to maintain btree indexes. For columnstore index, an index fragmentation is considered fragmented if (a) there are 10% or more rows marked as deleted in a compressed rowgroup (b) one or more smaller compressed rowgroups can be combined to create a larger compressed rowgroup such that the resultant compressed rowgroup has less than or equal to 1 million rows. Note, if a compressed rowgroup has less than 1 million rows due to dictionary size, it is not considered fragmented because there is nothing that can be done to increase its size.  Also recall that a columnstore index consists of zero or more delta rowgroups as shown the in the picture below.

Also, clustered and non-clustered columnstores:

SQL Server 2016 provides two flavors of columnstore index; clustered (CCI) and nonclustered (NCCI) columnstore index. As shown in the simplified picture below, both indexes are organized as columns but NCCI is created on an existing rowstore table as shown on the right side in the picture below while a table with CCI does not have a rowstore table. Both tables can have one or more btree nonclustered indexes.

If you haven’t looked at columnstore indexes yet, 2016 is a great time to start.

Columnstore Batch Mode Changes

Niko Neugebauer discusses how Batch Execution Mode has changed since SQL Server 2014:

I will share a little secret with you – it’s all about the Batch Execution Mode in SQL Server 2014: all those Hash Match iterators are running in Batch Mode, even though we are not using Columnstore Index anywhere.
In SQL Server 2016 this old (since 2012) functionality has been removed and once you are running your queries in the compatibility level of 130 (SQL Server 2016), your queries that were taking advantage of it – will be running significantly slower.

There is a fast & brutal solution for that problem – set your compatibility level to 120, but do not go there until you have understood all the implications: some of the most important and magnificent improvements for the Batch Execution Mode are functioning only if your database is set to compatibility level 130: single threaded batch mode, batch sorting, window functions, etc.
From what I know, there is no way you can have all of those functionalities working together under the same hood and enjoy the old way of getting Batch Execution Mode without the presence of the Columnstore Index.

The conclusion is a bit of a downer.  Read the whole thing.

Columnstore With Integer Sequences

Niko Neugebauer talks about handling sequences and default values within columnstore indexes:

There are still no dictionaries – and trying to rebuild this table will not bring any effect at all, but take a look at the size of the segments – their size was lowered for almost 40% to ~1.6 MB!

This technic is very effective if you are compressing the columns that you do access rarely – it should be considered for the log tables for example.
Also notice that Columnstore Archival compression will not bring any significant changes – the original 2.6 MB will lower to 2.42 while the variable char column will not get any further improvements, making the improvement difference around 32%.

Warning: Do NOT use this technic without understanding the consequences – the processing of such columns will lower their effectiveness, since Predicate Pushdown will work in a very limited way, plus the Segment Elimination will not work at all.



July 2017
« Jun