Press "Enter" to skip to content

Category: Columnstore

Getting A Grip On Columnstore

Melissa Coates has a nice introduction to columnstore indexes:

Many of the in-memory features in the Microsoft platform rely on the xVelocity (formerly known as VertiPaq) engine. The implementations do differ somewhat between products, such as the requirements for data to be truly memory-resident.

The remainder of this post will focus on the two columnstore technologies in SQL Server: clustered and nonclustered.

If you’re not very familiar with columnstore and aren’t quite ready to tackle Niko Neugebauer’s columnstore series, this is a good way to get started.

Comments closed

HTDELETE Wait Type

Joey D’Antoni troubleshoots a query with excessive HTDELETE waits:

Ultimately I think any thought of the readable secondary having a vastly different plan was a red herrings. Statistics are going to be the same on both instances, and if there were a missing statistic on the secondary, SQL Server would create it in TempDB. Anyway, columnstore indexes don’t use statistics in the traditional sense.

Fortunately I was able to catch a query in the process of waiting on HTDELETE, so I no longer had to look for the needle in the haystack, and I could get to tuning the plans. I was able to grab the SELECT part of the query and generate an estimated plan on both the primary and secondary nodes. The plans were virtually the same on both nodes, with just a minor difference in memory grant between them.

Click through for the solution.

Comments closed

High-Compression JSON With vNext

Jovan Popovic gives an example of 25x compression of JSON data using a clustered columnstore index in vNext:

CLUSTERED COLUMNSTORE INDEXES provide extreme data compression in SQL Server and Azure SQL Database. With NVARCHAR(MAX) support in CCI indexes you can use them on your JSON data stored is database and get high 25x compression. Therefore, CCI is a perfect solution if you need to store a large volume of JSON data in your SQL Database.

ContosoDW database is publicly available for download, so you can use this database and the script below to re-create this table and try this in your environment.

I’m curious whether this will also apply to non-JSON data.

Comments closed

JSON In Clustered Columnstore Indexes

Jovan Popovic gives a use case for JSON data being part of a clustered columnstore index:

This is equivalent to collections that you might find in classic NoSQL database because they store each JSON document as a single entity and optionally create indexes on these documents. The only difference is CLUSTERED COLUMNSTORE index on this table that provides the following benefits:

  1. Data compression – CCI uses various techniques to analyze your data and choose optimal compression algorithms to compress data.

  2. Batch mode analytic – queries executed on CCI process rows in the batches from 100 to 900 rows, which might be much faster than row-mode execution.

I think it’s worth reading this in conjunction with Niko Neugebauer’s comments regarding strings in columnstore.

Comments closed

Columnstore And Strings

Niko Neugebauer has a great knowledge dump regarding strings in columnstore indexes:

Having Strings in Fact tables is something that is quite normal, but to be honest, in the most cases – does not make a lot of sense, since we are trying to keep there the information that can be calculated and/or aggregated. Notice that I have written in the most cases and NOT in all cases, because there are some noticeable exceptions. Additionally if you are “feeding” SSAS Tabular with your table this might be much easier to do it directly (hey, there is a solution through the views for that, I was told :)).

In this blog post, I am focusing not on the exceptions but on the typical cases where its not the best option and so here is a basic solution I just wanted to present you an optimised structure, which contains a tinyint column referring to the new table with distinct data for the ShipMode.

The string experience with columnstore can be troublesome.  It’s great for numeric values, but less great for strings.

Comments closed

NCCI Online Rebuild

Niko Neugebauer notes a big improvement in vNext:

For the Columnstore Indexes, the only online operation for the indexes that was available so far, was the Row Group Merging and Removal with ALTER INDEX REORGANIZE (as well as the Tuple Mover operations). With appearance of HTAP scenarios (Hybrid Transactional Analytical Processing aka Operational Analytics) in SQL Server 2016, there was a huge need for the online index maintenance, making sure that the operational part of the HTAP runs smoothly. For any online business, taking their application down for an hour means loosing real money and even worse – loosing credibility from their customers. To my knowledge, Microsoft was very much aware and was working on improving this missing part.
For the SQL Server vNext version (after SQL Server 2016) in CTP 1.2, yesterday, we have finally received the first Online Rebuild operation for the Columnstore Indexes – in this case for the Nonclustered Columnstore Indexes, and this is a huge news for anyone using the HTAP scenarios.
Naturally this feature is Enterprise Edition Only, and like ever before – if you are running a critical workload, you need to step up and use the Enterprise Edition.

Online clustered columnstore reorganization in 2016 was a life-saver, and I’m looking forward to online clustered columnstore rebuilding at some point in the future.

Comments closed

More On Columnstore Batch Mode

Sunil Agarwal talks about batch mode processing with columnstore indexes:

While these results may not appear as dramatic on my laptop, the picture below shows the performance gains with Window Aggregates on a Server class machine with large DW database. The orange bar represents the query speed up we got with Window Aggregate operator in BatchMode. The highest speed up we saw was 289x!!

Batch mode is generally a huge benefit for data warehousing environments.

Comments closed

Quickstats Craziness

Lonny Niederstadt takes an in-depth look at the intersection of several features and discusses an issue:

Here’s the Connect Item involved – written by Michael Swart.

Trace flag 2390 can cause large compile time when dealing with filtered indexes. (Active)
https://connect.microsoft.com/SQLServer/feedback/details/2528743/

Huh.  “What does that have to do with clustered columnstore indexes?” you might ask.  No, really, it’d be a personal favor if you *did* ask…

Ask and then read on.  This is the kind of post I’d send to someone wanting to learn how to troubleshoot issues.

Comments closed

Columnstore Query Patterns

Niko Neugebauer gives a couple hints on how to make columnstore queries as fast as possible:

On my VM with 4 cores it takes 33 seconds to execute this query on SQL Server 2016 with Service Pack 1, while it burns almost 48 seconds of the CPU Time.
The relevant part of the execution plan can be found below, showing so many performance problems that this query is suffering, such as INNER LOOP JOIN, INDEX SPOOL, besides even worse part that is actually hidden and is identifiable only once you open the properties of any of the lower tree (left side of the LOOP JOIN), seeing that it all runs with the Row Execution Mode actually.

To show you the problem, on the left side you will find the properties of the sort iterator that is to be found in the lower (left) part of the LOOP Join that was executed around 770.000 times in the Row Execution Mode, effectively taking any chances away from this query to be executed in a fast way. One might argue that it might that it might be more effective to do the loop part in Row Mode, but given that we are sorting around 3.1 Million Rows there – for me there is no doubt that it would be faster to do it within a Batch Execution Mode. Consulting the last sort iterator in the execution plan (TOP N SORT), you will find that it is running with the help of the Batch Execution Mode, even though it is processing around 770.000 rows.

There’s some valuable information here.

Comments closed

Columnstore Partitioning

Niko Neugebauer warns against partitioning small tables with clustered columnstore indexes:

Needless to say that looking at the execution plans you notice that the actual execution plan shows 10 times difference between them, even though both tables contain the very same data!
The query cost for the partitioned table is staggering – it is around 10 times bigger (~8.8) vs (~0.81) for the first query.
The execution times reflect in part this situation: 12 ms vs 91 ms. Non-partitioned table performs almost 9 times faster overall and the spent CPU time is reflecting it: 15 ms vs 94 ms. Remember, that both tables are Columnstore Indexes based ! Partitioning your table in a wrong way will contain a huge penalty that might not be directly detectable through the execution plan of the complex queries. Well, you might want to use the CISL, just saying

If you can’t fill a single rowgroup, your partition is too granular.  Even then, I’d like to see double-digit rowgroups per partition, though that’s just me.

Comments closed