Press "Enter" to skip to content

Category: Columnstore

Clustered Columnstore Index Online Rebuild

Niko Neugebauer looks at a feature which will pop up in SQL Server vNext:

The current state of the Clustered Columnstore Index ONLINE rebuild points to be an unfinished version, which will definitely get vastly improved before being released & supported in SQL Server. I have seen a couple of deadlocks and canceled transactions and so I decided that this blog post will get updated as soon as there will be an official announcement of this feature.
If you are still looking to start working on this feature, then I would suggest trying it on smaller tables. Like really, really small ones.
Oh, and for online rebuild operation focus on using partition rebuild – you are using the partitioning, right ? 🙂

Niko gave this a try in Azure SQL Database, as there is no publicly available version of SQL Server which supports this.  I’ve been waiting for this feature for 3 years now, so I’ll be happy to see it in production.

Comments closed

Trickle Insertion With Clustered Columnstore Indexes

Sunil Agarwal provides a pattern for trickle loading clustered columnstore indexes:

A traditional scenario of loading data into CCI is a nightly load from one or more data files containing millions of rows. Recommended technique is to load the data with batchsize >= 102400 as explained https://blogs.msdn.microsoft.com/sqlserverstorageengine/2014/07/27/clustered-column-store-index-bulk-loading-the-data/. However, we are seeing many scenarios where data source is parallel data stream (i.e. trickle insert) to be loaded to CCI for analytics, a typical IOT scenario. CCI allows concurrent data streams into the same delta rowgroup. However, you will see higher page latch contention as you increase the concurrency. You may wonder why this is so? Well, each delta RG is organized internally as a clustered btree index and the dataload follows the pattern of monotonically increasing clustered index key causing latch contention on the last page.

Check out Sunil’s post.  I also have an older post covering my experiences with CCI trickle loads and three ETL patterns which can work.

Comments closed

Wait Stats Related To Columnstore Indexes

Niko Neugebauer has some documentation on important wait stats around columnstore:

I split the known wait types into the following distinct groups:
– Batch Execution Mode (HTBUILD, HTDELETE, HTMEMO, HTREINIT, HTREPARTITION, PWAIT_QRY_BPMEMORY, BPSORT)
– Columnstore Indexes (ROWGROUP_VERSION, ROWGROUP_OP_STATS, SHARED_DELTASTORE_CREATION, COLUMNSTORE_BUILD_THROTTLE, COLUMNSTORE_MIGRATION_BACKGROUND_TASK, COLUMNSTORE_COLUMNDATASET_SESSION_LIST)

With an appearance in the next SQL Server version of the Batch Execution Mode for the RowStore Indexes, the first group of the waits will suddenly be becoming more important for every single SQL Server user and mixing it together with the Wait Types specific to the internal structures and functions of the Columnstore Indexes makes no sense.

Read on to learn more about these important wait types.

Comments closed

Reserved Memory Allocation Waits And Trace Flag 834

Joe Obbish has another post looking at sub-optimal columnstore index performance:

It is possible to see a scalability bottleneck in the form of high average wait time for the RESERVED_MEMORY_ALLOCATION_EXT wait if a highly concurrent workload is run on a server that consumes memory with batch mode operators. I believe that the severity of the bottleneck depends on unknown factors in the server’s initial memory state and the rate of memory actually used by queries to run batch mode operations. This blog post shares a reproduction of the issue along with a call to action.

If you use clustered columnstore indexes, check out Joe’s User Voice entry.

Comments closed

Columnstore And Merge Replication

Niko Neugebauer tests whether merge replicated tables can use columnstore indexes:

Adding this table to the publication will end up with the following, self-explaining error message, being very clear that the Clustered Columnstore Indexes are not supported for the Merge Replication[.]

There is no surprise here, as the same Clustered Columnstore Indexes are not supported for the Transactional Replication, but I feel that a great opportunity is lost and the Replication technology are being quite ignored by the emerged technologies, such as In-Memory & Columnstore, where the scenarios of replicating the Data Warehousing data is something that a lot of people can find very useful.

I wish it would be otherwise, and this would allow to bring more customers to use Columnstore Indexes.

Clustered columnstore indexes aren’t possible, but read on to learn whether non-clustered columnstore indexes are supported.

Comments closed

Trace Flag 834 And Columnstore Tables

Joe Obbish shows how trace flag 834 can solve a bottleneck when inserting into tables with clustered columnstore indexes:

In my experience, when we get into a situation with high memory waits caused by too much concurrent CCI activity all queries on the server that use a memory grant can be affected. For example, I’ve seen sp_whoisactive run for longer than 90 seconds.

It needs to be stated that not all CCIs will suffer from this scalability problem. I was able to achieve good scalability with some artificial tables, but all of the real target tables that I tested have excessive memory waits at high concurrency. Perhaps tables which require more CPU to compress naturally spread out their memory requests and the underlying OS is better able to keep up.

Read the whole thing, and also check out Lonny Niederstadt’s comment as it adds pertinent information about TF834.

Comments closed

Batch Mode Memory Fractions

Joe Obbish explains what memory fractions are and how incorrect calculations can lead to tempdb spills:

There’s very little information out there about memory fractions. I would define them as information in the query plan that can give you clues about each operator’s share of the total query memory grant. This is naturally more complicated for query plans that insert into tables with columnstore indexes but that won’t be covered here. Most references will tell you not to worry about memory fractions or that they aren’t useful most of the time. Out of thousands of queries that I’ve tuned I can only think of a few for which memory fractions were relevant. Sometimes queries spill to tempdb even though SQL Server reports that a lot of query memory was unused. In these situations I generally hope for a poor cardinality estimate which leads to a memory fraction which is too low for the spilling operator. If fixing the cardinality estimate doesn’t prevent the spill then things can get a lot more complicated, assuming that you don’t just give up.

Extremely interesting post.

Comments closed

Digging Into The In-Memory Columnstore Location

Niko Neugebauer does some investigation into where, exactly, memory-optimized columnstore data goes:

This is a rather simple blog post that is dedicated to the theme of the In-Memory Columnstore Indexes location. This has been a constant topic of discussion over a long period of time, even during the public events – and there is a need to clear out this topic.

I have assumed that the In-Memory Columnstore structures (Segments, Dictionaries, …) are located in the In-Memory, but there have been voices that I greatly respect, pointing that actually the Columnstore Object Pool is the exact location of any Columnstore structures, and there is nothing better than to take this feature for a ride and see what the SQL Server engine is actually doing.

Niko shows off a couple of useful DMVs along the way, too.

Comments closed

Columnstore Functionality Per Edition

Niko Neugebauer looks at how columnstore indexes differ between SQL Server Standard Edition, Express Edition, and Enterprise Edition:

One rather small (relatively other features, as I imagine), but an incredibly useful improvement was described in Columnstore Indexes – part 109 (“Trivial Plans in SQL Server 2017”) – is the ability to automatically produce Fully Optimised execution plans for the Database, which compatibility level is set to 140.

Running on both instances (Standard & Express), the following script, while altering the compatibility level between 140 (SQL Server 2017) & 130 (SQL Server 2016), will produce different execution plan for the SELECT COUNT_BIG(*) operation – the fast one (with FULL optimisation in 140 compatibility level) and slow one (with TRIVIAL optimisation in 130 compatibility level):

I am happy that this feature has got no Edition dependence, this is a needed improvement that simply increases the value of the offer and can actually be achieved in a lot of different ways, event without parallelism kicking in.

Niko has also helpfully provided a table at the end of the post to summarize his findings.

Comments closed

Columnstore Indexes And Partition Operations

Niko Neugebauer continues his columnstore index series, this time looking at how partitioned tables behave:

Let’s start with a simple test of merging the 2007 partition with the year 2008, by issuing the following command:

It might ready you a reasonably huge surprise, but this command will fail, if you are using the Columnstore Indexes.

The very same command will function without any problem, if we would simply avoid creating Clustered Columnstore Index …
The reason behind this limitation has to do with the fact that Columnstore Indexes do not sort or control the boundaries of the data, and this is biting the total implementation in such operations.

It’s an interesting read, and a little disappointing.

Comments closed