Partitioned Columnstore Tables

Denny Cherry makes an important point about dealing with columnstore tables:

ColumnStore indexes are all the rage with data warehouses. They’re fast, they’re new(ish) and they solve all sorts of problems when dealing with massive amounts of data.  However they can cause some issues as well if you aren’t very careful about how you setup your partitions on the ColumnStore index.  This is because, you can’t split a ColumnStore partition once it contains data.

Now, if everything is going according to plan you create your partitions well in advance and there’s no issues.

However, if everything hasn’t gone according to plan and someone forgets to create the partitions and you end up with rows in the final partition, you can’t create any more partitions because you can’t split the partition.

Ideally, you get those ducks in a row first.  Keep reading for a repro script and a couple potential workarounds.

Columnstore Partitioning In SQL Server 2016

Niko Neugebauer demonstrates some performance improvements to partitioned columnstore indexes in SQL Server 2016:

266ms was the partitioned table under SQL Server 2016 (compatibility level 120) while 353ms of the total elapsed time was obtained on SQL Server 2014! This represents a solid 25% improvement

All execution plans will have the same iterators, but will differ on the overall estimated cost (the non-partitioned queries will be way lower than the partitioned ones), as well as the distribution of the estimated costs within the execution plan, but as for the rest – it will be quite similar, like the one shown on the image below:

These improvements were swamped by the aggregate predicate pushdown improvements in 2016, at least in Niko’s example, but I’ll take a free 25%-33% performance improvement.

Indexed Views On Clustered Columnstore Indexes

If you’re using SQL Server 2014 and want to create a non-clustered rowstore index on top of your clustered columnstore index, Niko Neugebauer has you covered:

Here we have a beautiful and a simple execution plan, which delivers what we need – great performance with a relative low overall cost. And in this way we enjoy the possibility to get the best out of the 2 worlds – Columnstore & Rowstore.
Should we need to run a similar query but agains the whole dataset, it will be redirected to our Columnstore Index which will deliver great performance:

Alternatively, upgrade to SQL Server 2016 and you get this without introducing an indexed view into the mix.

Columnstore Indexes On Cloned Databases

Parikshit Savjani has a script to update columnstore index statistics before running DBCC CLONEDATABASE:

Unlike traditional Btree indexes, when a columnstore index is created, there is no index statistics created on the columns of the columnstore indexes. However, there is an empty stats object created with the same name as columnstore index and an entry is added to sys.stats at the time of index creation. The stats object is populated on the fly when a query is executed against the columnstore index or when executing DBCC SHOW_STATISTICS against the columnstore index, but the columnstore index statistics aren’t persisted in the storage. The index statistics is different from the auto created statistics on the individual columns of columnstore indexes which is generated on the fly and persisted in the statistics object. Since the index statistics is
not persisted in storage, the clonedatabase will not contain those statistics leading to inaccurate stats and different query plans when same query has run against database clone as opposed to production database.

Click through for the script.

Aggregate Predicate Pushdown And Data Types

Niko Neugebauer shows an example of how a slightly different data type can cause columnstore queries to be much faster:

Even though they are estimated to cost the same (50% for each one) with the estimated cost of 0.275286 to be more precise in this sense.
To be more precise in the reality you will notice the Aggregate Predicate Pushdown taking place on the first query, while the second query is using the Storage Engine to read out all of the 2 million rows from the table and filter it in the Hash Match iterator.

Actual Number of Locally Aggregated Rows
is the one property on the Columnstore Index Scan iterator that will give you an insight on what happened within the Columnstore Index Scan, since the Aggregate Predicate Pushdown is not shown as a filter on the property. This is not the most fortunate solution as far as I am concerned, but since the 0 rows flowing out of the Columnstore Index Scan will serve as a good indication that Aggregate Predicate Pushdown took place, but if you want to be sure of all the details you will need to check the properties of the involved iterators.

Definitely worth reading.

Memory-Optimized Clustered Columnstore Indexes

Ned Otter contrasts memory-optimized clustered columnstore indexes with traditional, disk-based clustered columnstore indexes:

5. Parallelism

On-disk: When you query an on-disk table that has a columnstore index, the database engine can use parallelism to process the results more quickly.

Memory-optimized: When you query a memory-optimized table that has a columnstore index, the database engine can use parallelism to process the results more quickly, BUT, that statement is only true if you use interop. Natively compiled modules are always executed serially.

Click through for the rest of the comparison points as well as a repro script.

Identity Integers And Columnstore

Niko Neugebauer looks at how clustered columnstore indexes handle identity integers with respect to data load times:

This blog post will try to respond this question from the perspective of the data loading performance.

For this research I decided to pick 3 distinct scenarios to investigate, which refer to different ways to approach the solution:
– a CCI table with an Identity column
– a CCI table with a Sequence as a default value
– a CCI table without Identity

There’s a pretty substantial performance difference, so this is well worth the read for large columnstore data loads.

Zero, One, Close Enough

Kendra Little points out a columnstore optimization which leaves a strange execution plan as a result:

I have a very simple query. It’s running against a table with a nonclustered columnstore index.

SELECT COUNT(*) FROM pt.FirstNameByBirthDate_1966_2015;

GO

The query returns one row, as expected. Here’s my count:

For the record, that is the correct number of rows in the table. Here’s where things get weird. In the actual execution plan, the columnstore index returns zero rows.

Yes, this is really the actual execution plan. I’m not tricking you, I promise.

Click through for the answer.

Merging With Columnstore

Niko Neugebauer does not like the MERGE statement when applied to columnstore indexes:

This blog post is focused on the MERGE statement for the Columnstore Indexes, or as I call it – the worst enemy of the Columnstore Indexes. It is extremely difficult to imagine some statement or way of making the worst out of the Columnstore Indexes, if not the infamous MERGE statement. Why ? Because it is not only making Columnstore Indexes perform slow, it will make them perform MUCH SLOWER then any Rowstore Indexes. Yes, you have read right – slower then ANY_ROWSTORE_INDEXES. In fact, this should be a hint that one should apply to the Merge statement, when it is executed against Columnstore Indexes! 🙂
I decide to dedicate a whole blog post on this matter, mainly to warn people of this pretty problematic statement – I hope not to see it being used for Columnstore Indexes in the future!

There is very little room for misunderstanding in Niko’s post.

Looking At Columnstore In vNext

Dmitry Pilugin looks at non-clustered columnstore indexes in vNext:

Note: Notice, that I created a varchar(max) column in a table, that was done deliberately, because I wanted to test if LOB columns are supported in vNext. Unfortunatelly, when I tried to create a nonclustered Columnstore index, I got an error:
The statement failed. Column ‘Note’ has a data type that cannot participate in a columnstore index. Omit column ‘Note’.

However, if you create a clustered Columnstore, you will succeed! Though, you will get the following warning in CTP 1.2:

Warning: Using Lob types (NVARCHAR(MAX), VARCHAR(MAX), and VARBINARY(MAX)) with Clustered Columnstore Index is in public preview. Do not use it on production data without backup during public preview period.

I think it is great that we can now have a LOB varchar columns in Columnstore, because it was a blocker for some scenarios, at least I have seen some of those.

There are a few interesting findings here, so check it out.

Categories

April 2017
MTWTFSS
« Mar  
 12
3456789
10111213141516
17181920212223
24252627282930