Press "Enter" to skip to content

Category: Columnstore

Columnstore Parallelism

Sunil Agarwal shows how clustered columnstore indexes take advantage of parallelism:

Do you need to be concerned about that a delta rowgroup is scanned single threaded? The answer is NO for two reasons (a) most columnstore indexes have very few delta rowgroups (b) if you have multiple delta rowgroups, they can be scanned in parallel with one thread per delta rowgroup

I have a beef with (a), at least for SQL Server 2014, but that’s a story for another day.

Sunil has a follow-up post on parallel bulk import:

Recall that on rowstore tables (i.e. the tables organized as rows not as columnstore), SQL Server requires you to specify TABLOCK for parallel bulk import to get minimal logging and locking optimizations. One key difference for tables with clustered columnstore index is that you don’t need TABLOCK for getting locking/logging optimizations for bulk import. The reasons for this difference in behavior is that each bulk import thread can load data exclusively into a columnstore rowgroup. If the batch size < 102400, then the data is imported into a delta rowgroup otherwise a new compressed rowgroup is created and the data is loaded into it. Let us take two following interesting cases to show this bulk import behavior. Assume you are importing 4 data files, each with one bulk import thread, concurrently into a table with clustered columnstore index

The “don’t use TABLOCK” is interesting in comparison to rowstore tables.

Comments closed

Columnstore And SSIS 2016

Niko Neugebauer mentions that with SQL Server 2016 and Integration Services 2016, clustered columnstore index insertion can get much faster:

To solve the performance problem I went straight to the DefaultBufferMaxRows setting and set it to be equal of the maximum number of rows in a Row Group – 1048576. Together with the AutoAdjustBufferSize setting it helps the actual current size of the DataFlow Buffer that will be used for transferring the data from the source to the destination table.

What should I say – it worked like magic:
I guess that with 2:09 Minutes the clear winner of this test is the configuration with AutoAdjustBufferSize set to True and the DefaultBufferMaxRows to 1048576. It took less then a half of the time with just AutoAdjustBufferSize activated and the insertion process was executed with the help of the Bulk Load API – meaning that we did not have to wait for the Tuple Mover or to execute it manually.

Doubling insertion performance is nothing to scoff, especially for something like columnstore tables, where we expect millions (or more) of rows to be inserted.

Comments closed

Columnstore Index Compression Delay

Niko Neugebauer found a “compression delay” option on columnstore indexes in SQL Server 2016 CTP 3.2:

In SQL Server 2016 the OLTP Systems have received a significant improvement – support for the Columnstore Indexes (disk-based Nonclustered Columnstore & In-memory based Clustered Columnstore).
In both cases we have as the base the underlying OLTP-style table, with a Delta-Store object (or Tail Row Group for InMemory tables), that will hold the new data being inserted or updated by the final users. The data that is being frequently updated in OLTP-style systems is called Hot Data. The data that just being inserted into your table is definitely a Hot Data.
The important moment for the table is when the data becomes Cold or mostly infrequently read-accessed, and meaning that it can be compressed into Columnstore format.

This does seem interesting and can be very helpful in using columnstore indexes across different data patterns.

Comments closed

Columnstore Pagination

Aaron Bertrand investigates whether columnstore indexes help with pulling out specific “pages” of data:

Since then, I’ve wondered if ColumnStore indexes (both clustered and non-clustered) might help any of these scenarios. TL;DR: Based on this experiment in isolation, the answer to the title of this post is a resounding NO. If you don’t want to see the test setup, code, execution plans, or graphs, feel free to skip to my summary, keeping in mind that my analysis is based on a very specific use case.

I actually would have been surprised to find the answer here to be “yes.”  Columnstore is designed with aggregation in mind, rather than pulling out a fairly small subset of the data.

Comments closed

Columnstore In 2016

Niko Neugebauer has two new posts up on columnstore index changes with SQL Server 2016.

First, row group merging with clustered columnstore indexes:

Row Group merging & cleanup is a very long waited improvement that came out in SQL Server 2016. Once Microsoft has announced this functionality, everyone who has worked with SQL Server 2014 & Clustered Columnstore Indexes has rejoiced – one of the major problems with logical fragmentation because of the deleted data is solved! Amazing!
Just as a reminder – logical fragmentation is the process when we mark obsolete data in the Deleted Bitmap (in Columnstore Indexes there is no direct data removal from the compressed Segments with Delete command and Update command uses Deleted Bitmap as well marking old versions of rows as deleted).

Second, Stretch DB with columnstore:

Stretch DB or alternatively Stretch Database is a way of spreading your table between SQL Server (on-premises, VM in Azure) and a Azure SQLDatabase. This means that the dat of the table will shared between the SQL Server and the Azure SQLDatabase giving the opportunity to lower the total cost of the local storage, since Azure SQLDatabase is cheap relatively expensive storage typically used on the local SQL Server installations.
This mean that the table data will be separated intoHot Data & Cold Data, where Hot Data is the type of data that is frequently accessed and it extremely important (this is typically some OLTP data) and the Cold Data (this is typically rarely or almost never accessed archival or log data).
For the final user the experience should be the same as before – should he ask for some data that is not on the SQL Server, then it will be read from Azure SQLDatabase by the invocation of remote query, joined with the local results (if any) and then presented to the user.

These two posts are must-reads if you work with columnstore indexes.

Comments closed

CISL

Niko Neugebauer talks about the Columnstore Indexed Scripts Library:

Around 3.5 Months ago in September of 2015, I have announced the first public release of the CISL – Columnstore Indexes Scripts Library, which allows to have a deeper insight into the database that uses or can use Columnstore Indexes.
Since that, I have released 4 more “point releases” with bug fixes and new features, I have greatly expanded the support of SQL Server with inclusion of SQL Server 2012, SQL Server 2016 and Azure SQLDatabase.

If you use columnstore indexes, you absolutely want to get this.  Also, there’s a brand new update out.

Comments closed

Trace Flag 834 And Columnstore

Chris Bell warns us against having Trace Flag 834 turned on in an instance which contains columnstore indexes:

[I]t is not recommended to have trace flag 834 on when using columnstore indexes in your databases.

Since the 834 trace flag is a global level flag, and columnstores are in individual databases I wrote the script below to go through and check if you ave any columnstore indexes, and then check if the trace flag is enabled.

Chris also has a helpful script to see if your instance has this issue.

Comments closed

In-Memory Analytics

Sunil Agarwal introduces us to In-Memory Analytics, forthcoming in SQL Server 2016:

SQL Server 2016 has significant advancements over SQL Server 2014 for In-Memory analytics. Some highlights are functionality (e.g. ability to create traditional nonclustered index to enforce PK/FK), performance (e.g. addition of new BatchMode operators, Aggregate pushdown), Online index defragmentation, and supportability (e.g. new DMVs, Perfmon counters and XEvents).

His post talks a little bit about in-memory, but focuses more on clustered columnstore indexes.  I like that columnstore indexes are getting V3 improvements, and I think they’ll be even more useful.  Whether the “in-memory” part becomes useful is a different question; I personally have seen a very limited adoption of In-Memory OLTP (and a few huge bugs for the people brave enough to try it).

Comments closed

Increased Delta Store Size For NCIs

Niko Neugebauer, in part 73 of his ongoing columnstore index series:

This result was observed right after the finish of the loading script, where we can clearly see 4 Delta-Stores for 10 Million Rows. 3 of the Delta-Stores are Closed and 1 Delta-Store is Open, which is an absolutely impossible combination if we think about Clustered Columnstore Indexes, where one would expect to have 10 Compressed Row Groups or 10 Delta-Stores (9 Closed & 1 Open).
If you take a more detailed look at the associated sizes of the closed Delta-Stores, you will see that they increase each time a new Delta-Store is being used. For example, the first one is capped at 1.048.567 Rows, the second one is capped at 2.097.152 and the last closed Delta-Store is set to 4.193.904 Rows – meaning that the size is being constantly doubled.

I’d like to see this as the first step toward expanded sizes for compressed rowgroups.

Comments closed