Avoid Key Lookups On Clustered Columnstore Indexes

Joey D’Antoni points out a potential big performance problem with clustered columnstore indexes:

In the last year or so, with a large customer who makes fairly heavy use of this pattern, I’ve noticed another concern. Sometimes, and I can’t figure out what exactly triggers it, the execution plan generated, will do a seek against the nonclustered index and then do a key lookup against the columnstore as seen below. This is bad for two reasons–first the key lookup is super expensive, and generally columnstores are very large, secondly this key lookup is in row execution  mode rather than batch and drops the rest of the execution plan into row mode, thus slowing the query down even further.

Joey also has a UserVoice item as well, so check it out.

Tempdb Blocking With Non-Clustered Columnstore Indexes

Ned Otter runs into a tricky issue:

I have a client that used Itzik Ben-Gan’s solution of creating a filtered nonclustered columnstore index to achieve batch mode on a rowstore (in fact I proposed that the client consider it). They have an OLTP system, and often perform YTD calculations. When they tested, processing time was reduced by 30 to 50 percent, without touching a single line of application code. If that ain’t low hanging fruit, I don’t know what is —

However, during testing, I noticed some intermittent blocking that didn’t make sense to me. But I couldn’t nail it down, and they went live with the “filtered nonclustered columnstore index” solution.

Once they deployed – and there was a lot of concurrency – I could see what had eluded me during my proof of concept: blocking in tempdb.

Read on for the repro and check out Ned’s UserVoice bug report.

Cloning And Columnstore Statistics

Niko Neugebauer points out a fix in SQL Server 2019:

I have a huge love for the DBCC CLONEDATABASE command – it has been made available (backported) to every SQL Server version starting with SQL Server 2012, since the original release in SQL Server 2014, while being constantly improved in the Service Packs and Cumulative Updates.

This blog post is focusing on the Database Cloning improvement in the SQL Server 2019 that is already available in the public CTP 2.0 – the possibility of the automated statistics extraction for the Columnstore Indexes.
WHY ?
Well, there was quite a significant problem with the Columnstore Indexes previously – the statistics for them were not extracted into the cloned database, unless you did created the statistics in the most recent step before Database cloning.

Click through for more details and a comparison between SQL Server versions.

Hybrid Columnstore And B+ Tree Designs

Adrian Colyer reviews a Microsoft paper on the combination of columnstore and B+ tree indexes on a single table:

The authors conducted a series of microbenchmarks as follows:

  • scans with single predicates with varying selectivity to study the trade-off between the range scan of a B+ tree vs a columnstore scan

  • sort and group-by queries to study the benefit of the sort order supported by B+ trees (columnstores in SQL Server are not sorted).

  • update statements with varying numbers of updated rows to analyze the cost of updating the different index types

  • mixed workloads with different combinations of reads and updates

It’s interesting to read an academic paper covering the topic, particularly when you can confirm that it works well in practice too.

Estimating Columnstore Index Size

Niko Neugebauer is somewhat impressed with an update to sp_estimate_data_compression_savings:

The final results were pretty much amazing – 2.1 GB, making the estimation of the sys.sp_estimate_data_compression_savings System Stored Procedure much more precise then my own function! This leaves me very happy and makes me want to investigate and learn how this new stored procedure is capable of providing better estimations.

I decided to test on the other tables within TPCH database and my test on the Orders table have shown a different situation where the 0.7 GB estimation of the sys.sp_estimate_data_compression_savings Stored Procedure were pretty much offbeat when comparing to the CISL dbo.cstore_sp_estimate_columnstore_compression_savings – showing 0.92 GB while the end result was 0.89 GB.
I guess the final answer is that it will depend, but that the estimation of the new stored procedure are not totally offbeat is an incredibly good sign, though I would still keep an eye or even two on the provided estimated results.

Read the whole thing for more details on these examples.

Where Columnar Databases Struggle

Teo Lachev makes a good point regarding columnar databases:

A large company uses the SAP HANA ERP system. Users requires real-time access to transactional data. To avoid performance degradation, SLT replication (trigger-based change data capture) replicates data to another SAP HANA system that is used solely for reporting. The problem is that the more detailed the report gets and the more columns it has, the slower it gets and SAP HANA throws out of memory exceptions.

SAP HANA is an in-memory columnar database like Tabular. So, it stores data in columns, not rows. Columnar databases are primarily designed for analytical reports which typically have a few columns (sales by customer, product, date), but can potentially aggregate large datasets. As the reporting grain lowers and more columns are added (order number, order line item, customer name, phone number, etc.), a columnar database has to cross-join more and more columns. This is not efficient and performance quickly degrades irrespective that storage is fast. SSAS Tabular and Power BI are no different. SAP HANA complicates the issue further by preventing direct access to tables and requiring “analytical” views that join tables and potentially nest other views.

Read the whole thing.  Teo has a great point:  there are trade-offs between different data platform technologies, and choosing the right one is important.

When Rowstore Compression Beats Columnstore

Joe Obbish looks at scenarios where page-level compression on rowstore tables can beat columnstore compression in terms of resultant table size:

It’s certainly more difficult to come up with a demo that works without string columns, but consider how the page compression algorithm works. Data can be compressed on page basis, which includes both multiple rows and multiple columns. That means that page compression can achieve a higher compression ratio when a row has identical values in different columns. Columnstore is only able to compress on an individual column basis and you won’t directly see better compression with repeated values in different columns for a single row (as far as I know).

Interestingly, Joe also comes up with a scenario where row-level compression can beat columnstore even without string values.  All this said, the normal case when dealing with non-string data is that columnstore tends to compress a lot better.

Clustered Columnstore Index Online Rebuild

Niko Neugebauer looks at a feature which will pop up in SQL Server vNext:

The current state of the Clustered Columnstore Index ONLINE rebuild points to be an unfinished version, which will definitely get vastly improved before being released & supported in SQL Server. I have seen a couple of deadlocks and canceled transactions and so I decided that this blog post will get updated as soon as there will be an official announcement of this feature.
If you are still looking to start working on this feature, then I would suggest trying it on smaller tables. Like really, really small ones.
Oh, and for online rebuild operation focus on using partition rebuild – you are using the partitioning, right ? 🙂

Niko gave this a try in Azure SQL Database, as there is no publicly available version of SQL Server which supports this.  I’ve been waiting for this feature for 3 years now, so I’ll be happy to see it in production.

Trickle Insertion With Clustered Columnstore Indexes

Sunil Agarwal provides a pattern for trickle loading clustered columnstore indexes:

A traditional scenario of loading data into CCI is a nightly load from one or more data files containing millions of rows. Recommended technique is to load the data with batchsize >= 102400 as explained https://blogs.msdn.microsoft.com/sqlserverstorageengine/2014/07/27/clustered-column-store-index-bulk-loading-the-data/. However, we are seeing many scenarios where data source is parallel data stream (i.e. trickle insert) to be loaded to CCI for analytics, a typical IOT scenario. CCI allows concurrent data streams into the same delta rowgroup. However, you will see higher page latch contention as you increase the concurrency. You may wonder why this is so? Well, each delta RG is organized internally as a clustered btree index and the dataload follows the pattern of monotonically increasing clustered index key causing latch contention on the last page.

Check out Sunil’s post.  I also have an older post covering my experiences with CCI trickle loads and three ETL patterns which can work.

Wait Stats Related To Columnstore Indexes

Niko Neugebauer has some documentation on important wait stats around columnstore:

I split the known wait types into the following distinct groups:
– Batch Execution Mode (HTBUILD, HTDELETE, HTMEMO, HTREINIT, HTREPARTITION, PWAIT_QRY_BPMEMORY, BPSORT)
– Columnstore Indexes (ROWGROUP_VERSION, ROWGROUP_OP_STATS, SHARED_DELTASTORE_CREATION, COLUMNSTORE_BUILD_THROTTLE, COLUMNSTORE_MIGRATION_BACKGROUND_TASK, COLUMNSTORE_COLUMNDATASET_SESSION_LIST)

With an appearance in the next SQL Server version of the Batch Execution Mode for the RowStore Indexes, the first group of the waits will suddenly be becoming more important for every single SQL Server user and mixing it together with the Wait Types specific to the internal structures and functions of the Columnstore Indexes makes no sense.

Read on to learn more about these important wait types.

Categories

November 2018
MTWTFSS
« Oct  
 1234
567891011
12131415161718
19202122232425
2627282930