Category: Columnstore

Automating Columnstore Index Partition Rebuilds

Published 2020-07-07 by Kevin Feasel

This post provides an example of a stored procedure which A) identifies the last two partitions of an Azure Synapse Analytics SQL pool table (which uses the columnstore index (default)) and B) rebuilds the index for these two partitions. Additionally, a sample PowerShell-based Azure Automation runbook is included for scheduling the execution of this procedure.
This post follows up on the previous post regarding a Power BI template to be used to analyze the health or quality of a columnstore index. For example, the template shared may help you find that the last one or two partitions such as partition numbers 39 and 40 out of 40 partitions may have many open (uncompressed) and/or not-optimized rowgroups. The cause of these low quality partitions could be that recent and ongoing data processing events are impacting these partitions (inserts,updates). Perhaps partitions 39 and 40 refer to the current and prior month for example.

Read on for the link to the script, as well as details on how to use it.

Comments closed

Tips for Optimizing Columnstore Indexes

Published 2020-06-29 by Kevin Feasel

Ed Pollack continues a series on columnstore indexes:

This is worth a second mention: Avoid updates at all costs! Columnstore indexes do not treat updates efficiently. Sometimes they will perform well, especially against smaller tables, but against a large columnstore index, updates can be extremely expensive.
If data must be updated, structure it as a single delete operation followed by a single insert operation. This will take far less time to execute, cause less contention, and consume far fewer system resources.

Read on for several more tips along these lines.

Comments closed

The Architecture of Columnstore Indexes

Published 2020-05-27 by Kevin Feasel

Ed Pollack has started a series on columnstore indexing:

By storing data grouped by columns, like values can be grouped together and therefore compress very effectively. This compression will often reduce the size of a table by 10x and offers significant improvements over standard SQL Server compression.
For example, if a table with a billion rows has an ID lookup column that has 100 distinct values, then on average each value will be repeated 10 million times. Compressing sequences of the same value is easy and results in a tiny storage footprint.
Just like standard compression, when columnstore data is read into memory, it remains compressed. It is not decompressed until runtime when needed. As a result, less memory is used when processing analytic queries. This allows more data to fit in memory at one time, and the more operations that can be performed in memory, the faster queries can execute.

In scenarios where it makes sense, I absolutely love clustered columnstore indexes.

Comments closed

Columnstore and Memory-Optimized tempdb

Published 2020-05-08 by Kevin Feasel

Erik Darling has a bucket of cold water for us:

In SQL Server 2019:
– Exciting stuff: In memory tempdb!
– Exciting stuff: sp_estimate_data_compression_savings can evaluate columnstore compression!
– Disappointing stuff: If you use in memory tempdb, you can’t have any columnstore anything in tempdb
That means if you’re using sneaky tricks like clustered columnstore indexes on temp tables to induce batch mode, you’re gonna get a lot of errors.
Likewise, you won’t be able to evaluate if columnstore will help your tables.

Click through to understand the extent of this limitation. Hopefully this is something we see addressed in vNext and a CU for 2019.

Comments closed

SQL Server 2019 and Columnstore Cleanup Issues

Published 2020-04-27 by Kevin Feasel

Taryn Pratt shares an issue with columnstore cleanup in SQL Server 2019:

The output of sp_who2 was repeatedly showing GHOST CLEANUP and CREATE INDEX. Over and over and over again. To be clear, I’m not a clustered columnstore expert, I know enough to be able to maintain them as needed. I went to Twitter and mentioned what I was seeing. I was advised by @sqL_handLe to try trace flag 661 which disables the ghost record removal process, and by Joe Obbish via Erik Darling to enable trace flag 634 to disable the tuple mover background task.
Initially, we enabled trace flag 634, but the logs continued to grow. We disabled trace flag 634. Then we enabled trace flag 661, and the logs continued to grow, so we disabled it. Finally, we tried enabling both of the trace flags. The big jumps stopped, but we now had about 400GB of logs that needed to be written to the reporting cluster before we could perform the failover.
While the logs were exploding we wondered if whatever was happening might have been caused by the deletions we did in early February. But why would they be triggered by the upgrade to SQL Server 2019?

Read the whole thing if you’re looking at a migration to 2019.

Comments closed

Rebuilding Rowstore Indexes Online on Tables with Columnstore Indexes

Published 2020-02-07 by Kevin Feasel

Niko Neugebauer explains something about rebuilding rowstore indexes:

This blogpost will be about older SQL Server versions (2016, 2017) and some implications that I have found that people rather do not understand, until they hit the problems in productions – the ONLINE creation & rebuild operations for the Rowstore indexes when having Columnstore Indexes on their tables.
As you should know by now – SQL Server 2017 & SQL Server 2019 respectively brought the ONLINE features support for the Columnstore Indexes (Nonclustered(2017) & Clustered(2019)) and I have blogged about those news in Columnstore Indexes – part 96 (“Nonclustered Columnstore Index Online Rebuild”) in 2017 and in Columnstore Indexes – part 123 (“Clustered Columnstore Index Online Rebuild”) in 2018.
What I honestly did not expect is the amount of the same question I have faced as in the last 3 months regarding the rebuild operations for the the ONLINE creation & rebuild operations for the Rowstore indexes when having Columnstore Indexes.

Read on to see the demonstration for when you have a clustered columnstore index and a nonclustered columnstore index.

Comments closed

Columnstore and a Scheduler per Partition

Published 2020-02-04 by Kevin Feasel

Aaron Bertrand continues looking at columnstore indexing:

The idea is I would have eight instances of an application (SQLQueryStress) that would each run on a dedicated scheduler, handling only the data destined for a specific partition / filegroup / data file, ~100 million rows at a time

It was interesting seeing how to pull that off. Aaron also has some extended findings, so check it out.

Comments closed

Partitioning on Columnstore Table Loading

Published 2020-01-09 by Kevin Feasel

Aaron Bertrand continues a series around learning about columnstore indexes:

In part 1, I showed how both page and columnstore compression could reduce the size of a 1TB table by 80% or more. While I was impressed I could shrink a table from 1TB to 50GB, I wasn’t very happy with the amount of time it took (anywhere from 2 to 14 hours). With some tips graciously borrowed from folks like Joe Obbish, Lonny Niederstadt, Niko Neugebauer, and others, in this post I will try to make some changes to my original attempt to get better load performance. Since the regular columnstore index didn’t compress better than page compression on this data set, and took 13 hours longer to get there, I’ll focus solely on the more advanced solution using COLUMNSTORE_ARCHIVE compression.

Click through for part 2.

Comments closed

Columnstore Indexes in Azure SQL Database

Published 2019-12-31 by Kevin Feasel

Niko Neugebauer takes us through the columnstore offerings available in Azure SQL Database:

Almost 2 years ago (22nd of March 2018) in Columnstore Indexes – part 121 (“Columnstore Indexes on Standard Tier of Azure SQL DB”) I have already mentioned that Columnstore Indexes were available in Azure SQL Database in Standard 3 (S3) edition and higher, while people I meet keep on mentioning and believing that in order to get Columnstore Indexes one needs to use Premium editions.
Since that blog post a lot of time has passed and in the mean time we have got new tiers with new generations of provisioned General Purpose tiers (Generation 4, Generation 5, FSv2 Series & M Series) appearing, plus the Serverless Tier and not to forget the very promising Hyperscale tier … besides the Azure SQL Database Managed Instance of course, which has already been generally available for some time and the good old Elastic Pools which were never mentioned in original article.

It sounds like, on the whole, columnstore is a normal part of Azure SQL Database across the board—it’s not a special add-on feature.

Comments closed

Columnstore Versus Page Compression

Published 2019-12-27 by Kevin Feasel

Aaron Bertrand compares columnstore and page compression on a specific table:

Recently someone at work asked for more space to accommodate a rapidly growing table. At the time it had 3.75 billion rows, presented on 143 million pages, and occupying ~1.14TB. Of course we can always throw more disk at a table, but I wanted to see if we could scale this more efficiently than the current linear trend. Sounds like a great job for compression, right? But I also wanted to try out some other solutions, including columnstore – which people are surprisingly reluctant to try. I am no Niko, but I wanted to make an effort to see what it could do for us here.
Note that I am not focusing on reporting workload or other read query performance at this time – I merely want to see what impact I can have on storage (and memory) footprint of this data.
Here is the original table. I’ve changed table and column names to protect the innocent, but everything else is relatively accurate.

Page compression won, and I’ve got a pretty good idea why (though some of the diagnostic info is gone): Aaron has several VARCHAR and NVARCHAR columns, and those blow up the columnstore dictionary pretty fast. Aaron has more to go in this series, so stay tuned.

Comments closed