Category: Columnstore

Columnstore Dictionaries

Published 2017-06-26 by Kevin Feasel

Niko Neugebauer explains some interesting facts about columnstore index dictionaries:

From a recent experience at a customer, I had an opportunity to dive into the details of the Columnstore Indexes Dictionaries. I have to admit that my understanding of them was pretty low, from what I have learned in the recent days, and I would like to share what I have learned with everyone.

These are some of the key findings that I have discovered:
– The local dictionaries are not exclusively connected with just 1 Row Group, but with multiple ones;
– The dictionaries within Columnstore Indexes are compressed in a different way, depending on the type of the compression applied (Columnstore vs Columnstore Archival);

and let us dive into each one of them:

Read the whole thing.

Comments closed

Trivial Plans And Columnstore Indexes

Published 2017-06-20 by Kevin Feasel

Erik Darling warns us that trivial plans against clustered columnstore indexes could lead to row execution rather than batch execution:

Let’s look at one query with a few variations.
SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total]
FROM   dbo.t1 AS t
WHERE  t.Id > 0
       AND t.Id < 3;
The plan for it is alright. It’s fairly straightforward and the query finishes in about 170ms.

We can see from the graphical execution plan that it’s been Simple Parameterized. SQL Server does this to make plan caching more efficient.

Check out the entire post.

Comments closed

Finding Clustered Columnstore Index Candidates

Published 2017-06-15 by Kevin Feasel

Sunil Agarwal has a script that helps you find potential clustered columnstore index candidates:

Most of us understand that clustered columnstore index can typically provide 10x data compression and can speed up query performance up to 100x. While this sounds all so good, the question is how do I know which tables in my database could potentially benefit from CCI? For a traditional DW scenario with star schema, the FACT table is an obvious choice to consider. However, many workloads including DW have grown organically and it is not trivial to identify tables that could benefit from CCI. So the question is how can I quickly identify a subset of tables suitable for CCI in my workload?

Interestingly, the answer lies in leveraging the DMVs that collect data access patterns in each of the tables. The following DMV query provides a first order approximation to identify list of tables suitable for CCI. It queries the HEAP or the rowstore Clustered index using DMV sys.dm_db_index_operational_stats to identify the access pattern on the base rowstore table to identify tables that meet the criteria listed in the comments below:

Read on for the script, which has a sensible set of criteria.

Comments closed

Memory Grants Affecting Columnstore Load

Published 2017-06-06 by Kevin Feasel

Denzil Ribeiro explains how memory grant pressure can determine whether a columnstore bulk insert skips the deltastore or not:

We found that only at the beginning of the run, there was contention on memory grants (RESOURCE_SEMAPHORE waits), for a short period of time. After that and later into the process, we could see some latch contention on regular data pages, which we didn’t expect as each thread was supposed to insert into its own row group. You would also see this same data by querying sys.dm_exec_requests live, if you caught it within the first minute of execution, as displayed below.

This is useful in case you run into the issue, but also useful as a case study on effective troubleshooting.

Comments closed

Memory Requirements For Columnstore Rebuild/Reorg

Published 2017-06-01 by Kevin Feasel

Niko Neugebauer looks at memory requirements for rebuilding and reorganizing columnstore indexes:

To spare all the Wows & how can’s, Microsoft was well aware of this problem and has delivered a solution with Cumulative Update 3 for SQL Server 2016 with Service Pack 1:
FIX: SQL Server 2016 consumes more memory when you reorganize a columnstore index, and here it is – a new trace flag 6404 (documented in the link and thus should be supported), that will allow you to lower the memory requirements for the ALTER INDEX … REORGANIZE command.
Let’s take it for the test, by once again running the setup workload for the FactOnlineSales_Reindex table and then executing the following command, enabling the Trace Flag 6404 and then reorganising our Clustered Columnstore Index:

This is a rather interesting post and once again makes me wish that clustered columnstore indexes could be rebuilt online.

Comments closed

Columnstore Performance Counters

Published 2017-05-30 by Kevin Feasel

Niko Neugebauer talks about perfmon counters available for understanding what’s going on with columnstore indexes:

As mentioned right in the beginning of this article, Sunil Agarwal lead the development team into adding greatly valuable performance counters with it’s own object ‘MSSQL:Columnstore‘ that provides some incredible insight on some of the internal operations that are not exposed in other ways.
This was a very much needed step, because SQL Server 2014 has brought a lot of different performance counters and objects for the In-Memory (XTP), while Columnstore Indexes were deserving a good treatment of their own.

This gives you one more avenue for research if you’re experiencing columnstore-related issues.

Comments closed

Partitioned Columnstore Tables

Published 2017-04-27 by Kevin Feasel

Denny Cherry makes an important point about dealing with columnstore tables:

ColumnStore indexes are all the rage with data warehouses. They’re fast, they’re new(ish) and they solve all sorts of problems when dealing with massive amounts of data. However they can cause some issues as well if you aren’t very careful about how you setup your partitions on the ColumnStore index. This is because, you can’t split a ColumnStore partition once it contains data.

Now, if everything is going according to plan you create your partitions well in advance and there’s no issues.

However, if everything hasn’t gone according to plan and someone forgets to create the partitions and you end up with rows in the final partition, you can’t create any more partitions because you can’t split the partition.

Ideally, you get those ducks in a row first. Keep reading for a repro script and a couple potential workarounds.

Comments closed

Columnstore Partitioning In SQL Server 2016

Published 2017-04-06 by Kevin Feasel

Niko Neugebauer demonstrates some performance improvements to partitioned columnstore indexes in SQL Server 2016:

266ms was the partitioned table under SQL Server 2016 (compatibility level 120) while 353ms of the total elapsed time was obtained on SQL Server 2014! This represents a solid 25% improvement

All execution plans will have the same iterators, but will differ on the overall estimated cost (the non-partitioned queries will be way lower than the partitioned ones), as well as the distribution of the estimated costs within the execution plan, but as for the rest – it will be quite similar, like the one shown on the image below:

These improvements were swamped by the aggregate predicate pushdown improvements in 2016, at least in Niko’s example, but I’ll take a free 25%-33% performance improvement.

Comments closed

Indexed Views On Clustered Columnstore Indexes

Published 2017-04-05 by Kevin Feasel

If you’re using SQL Server 2014 and want to create a non-clustered rowstore index on top of your clustered columnstore index, Niko Neugebauer has you covered:

Here we have a beautiful and a simple execution plan, which delivers what we need – great performance with a relative low overall cost. And in this way we enjoy the possibility to get the best out of the 2 worlds – Columnstore & Rowstore.
Should we need to run a similar query but agains the whole dataset, it will be redirected to our Columnstore Index which will deliver great performance:

Alternatively, upgrade to SQL Server 2016 and you get this without introducing an indexed view into the mix.

Comments closed

Columnstore Indexes On Cloned Databases

Published 2017-04-04 by Kevin Feasel

Parikshit Savjani has a script to update columnstore index statistics before running DBCC CLONEDATABASE:

Unlike traditional Btree indexes, when a columnstore index is created, there is no index statistics created on the columns of the columnstore indexes. However, there is an empty stats object created with the same name as columnstore index and an entry is added to sys.stats at the time of index creation. The stats object is populated on the fly when a query is executed against the columnstore index or when executing DBCC SHOW_STATISTICS against the columnstore index, but the columnstore index statistics aren’t persisted in the storage. The index statistics is different from the auto created statistics on the individual columns of columnstore indexes which is generated on the fly and persisted in the statistics object. Since the index statistics is
not persisted in storage, the clonedatabase will not contain those statistics leading to inaccurate stats and different query plans when same query has run against database clone as opposed to production database.

Click through for the script.

Comments closed