In SQL Server 2016 the OLTP Systems have received a significant improvement – support for the Columnstore Indexes (disk-based Nonclustered Columnstore & In-memory based Clustered Columnstore).
In both cases we have as the base the underlying OLTP-style table, with a Delta-Store object (or Tail Row Group for InMemory tables), that will hold the new data being inserted or updated by the final users. The data that is being frequently updated in OLTP-style systems is called Hot Data. The data that just being inserted into your table is definitely a Hot Data.
The important moment for the table is when the data becomes Cold or mostly infrequently read-accessed, and meaning that it can be compressed into Columnstore format.
This does seem interesting and can be very helpful in using columnstore indexes across different data patterns.
Since then, I’ve wondered if ColumnStore indexes (both clustered and non-clustered) might help any of these scenarios. TL;DR: Based on this experiment in isolation, the answer to the title of this post is a resounding NO. If you don’t want to see the test setup, code, execution plans, or graphs, feel free to skip to my summary, keeping in mind that my analysis is based on a very specific use case.
I actually would have been surprised to find the answer here to be “yes.” Columnstore is designed with aggregation in mind, rather than pulling out a fairly small subset of the data.
There you have it; our recommendation is to choose a batchsize of > 102400 to get benefits of minimal logging with clustered columnstore index. In the next blog, I will discuss parallel bulk import and locking optimizations.
My experience is that you really want to insert in large batches.
Niko Neugebauer has two new posts up on columnstore index changes with SQL Server 2016.
Row Group merging & cleanup is a very long waited improvement that came out in SQL Server 2016. Once Microsoft has announced this functionality, everyone who has worked with SQL Server 2014 & Clustered Columnstore Indexes has rejoiced – one of the major problems with logical fragmentation because of the deleted data is solved! Amazing!
Just as a reminder – logical fragmentation is the process when we mark obsolete data in the Deleted Bitmap (in Columnstore Indexes there is no direct data removal from the compressed Segments with Delete command and Update command uses Deleted Bitmap as well marking old versions of rows as deleted).
Stretch DB or alternatively Stretch Database is a way of spreading your table between SQL Server (on-premises, VM in Azure) and a Azure SQLDatabase. This means that the dat of the table will shared between the SQL Server and the Azure SQLDatabase giving the opportunity to lower the total cost of the local storage, since Azure SQLDatabase is cheap relatively expensive storage typically used on the local SQL Server installations.
This mean that the table data will be separated intoHot Data & Cold Data, where Hot Data is the type of data that is frequently accessed and it extremely important (this is typically some OLTP data) and the Cold Data (this is typically rarely or almost never accessed archival or log data).
For the final user the experience should be the same as before – should he ask for some data that is not on the SQL Server, then it will be read from Azure SQLDatabase by the invocation of remote query, joined with the local results (if any) and then presented to the user.
These two posts are must-reads if you work with columnstore indexes.
Around 3.5 Months ago in September of 2015, I have announced the first public release of the CISL – Columnstore Indexes Scripts Library, which allows to have a deeper insight into the database that uses or can use Columnstore Indexes.
Since that, I have released 4 more “point releases” with bug fixes and new features, I have greatly expanded the support of SQL Server with inclusion of SQL Server 2012, SQL Server 2016 and Azure SQLDatabase.
If you use columnstore indexes, you absolutely want to get this. Also, there’s a brand new update out.
[I]t is not recommended to have trace flag 834 on when using columnstore indexes in your databases.
Since the 834 trace flag is a global level flag, and columnstores are in individual databases I wrote the script below to go through and check if you ave any columnstore indexes, and then check if the trace flag is enabled.
Chris also has a helpful script to see if your instance has this issue.
SQL Server 2016 has significant advancements over SQL Server 2014 for In-Memory analytics. Some highlights are functionality (e.g. ability to create traditional nonclustered index to enforce PK/FK), performance (e.g. addition of new BatchMode operators, Aggregate pushdown), Online index defragmentation, and supportability (e.g. new DMVs, Perfmon counters and XEvents).
His post talks a little bit about in-memory, but focuses more on clustered columnstore indexes. I like that columnstore indexes are getting V3 improvements, and I think they’ll be even more useful. Whether the “in-memory” part becomes useful is a different question; I personally have seen a very limited adoption of In-Memory OLTP (and a few huge bugs for the people brave enough to try it).
This result was observed right after the finish of the loading script, where we can clearly see 4 Delta-Stores for 10 Million Rows. 3 of the Delta-Stores are Closed and 1 Delta-Store is Open, which is an absolutely impossible combination if we think about Clustered Columnstore Indexes, where one would expect to have 10 Compressed Row Groups or 10 Delta-Stores (9 Closed & 1 Open).
If you take a more detailed look at the associated sizes of the closed Delta-Stores, you will see that they increase each time a new Delta-Store is being used. For example, the first one is capped at 1.048.567 Rows, the second one is capped at 2.097.152 and the last closed Delta-Store is set to 4.193.904 Rows – meaning that the size is being constantly doubled.
I’d like to see this as the first step toward expanded sizes for compressed rowgroups.