Press "Enter" to skip to content

Category: Columnstore

The Tuple Mover in SQL Server 2019

Taryn Pratt gives us closure on an issue from a few months back:

I suggest reading my other post first, it’ll only take a few minutes. I’ll wait…

However, if you really don’t want to read it, here’s a quick recap on the initial issue.

In early February 2020, a lot of data was deleted from some clustered columnstore indexes in our PRIZM database. Some of the tables were rebuilt, but 11 tables weren’t since we don’t have maintenance windows, and that would involve downtime. The rebuilds would happen once we upgraded to SQL Server 2019, to take advantage of the ability to rebuild those columnstore indexes online.

Taryn now has the full story and I recommend giving it a read.

Comments closed

Getting Started with Redshift

Rahul Mehta has a primer on AWS Redshift:

AWS Redshift is a columnar data warehouse service on AWS cloud that can scale to petabytes of storage, and the infrastructure for hosting this warehouse is fully managed by AWS cloud. Redshift operates in a clustered model with a leader node, and multiple worked nodes, like any other clustered or distributed database models in general. It is based on Postgres, so it shares a lot of similarities with Postgres, including the query language, which is near identical to Structured Query Language (SQL). This Redshift supports creating almost all the major database objects like Databases, Tables, Views, and even Stored Procedures. In this article, we will explore how to create your first Redshift cluster on AWS and start operating it.

I’m not really the biggest fan of Redshift around, but Rahul does a good job walking us through the basics of the product.

Comments closed

Automating Columnstore Index Partition Rebuilds

Brett Powell has a procedure for us:

This post provides an example of a stored procedure which A) identifies the last two partitions of an Azure Synapse Analytics SQL pool table (which uses the columnstore index (default)) and B) rebuilds the index for these two partitions. Additionally, a sample PowerShell-based Azure Automation runbook is included for scheduling the execution of this procedure.

This post follows up on the previous post regarding a Power BI template to be used to analyze the health or quality of a columnstore index. For example, the template shared may help you find that the last one or two partitions such as partition numbers 39 and 40 out of 40 partitions may have many open (uncompressed) and/or not-optimized rowgroups. The cause of these low quality partitions could be that recent and ongoing data processing events are impacting these partitions (inserts,updates). Perhaps partitions 39 and 40 refer to the current and prior month for example.

Read on for the link to the script, as well as details on how to use it.

Comments closed

Tips for Optimizing Columnstore Indexes

Ed Pollack continues a series on columnstore indexes:

This is worth a second mention: Avoid updates at all costs! Columnstore indexes do not treat updates efficiently. Sometimes they will perform well, especially against smaller tables, but against a large columnstore index, updates can be extremely expensive.

If data must be updated, structure it as a single delete operation followed by a single insert operation. This will take far less time to execute, cause less contention, and consume far fewer system resources.

Read on for several more tips along these lines.

Comments closed

The Architecture of Columnstore Indexes

Ed Pollack has started a series on columnstore indexing:

By storing data grouped by columns, like values can be grouped together and therefore compress very effectively. This compression will often reduce the size of a table by 10x and offers significant improvements over standard SQL Server compression.

For example, if a table with a billion rows has an ID lookup column that has 100 distinct values, then on average each value will be repeated 10 million times. Compressing sequences of the same value is easy and results in a tiny storage footprint.

Just like standard compression, when columnstore data is read into memory, it remains compressed. It is not decompressed until runtime when needed. As a result, less memory is used when processing analytic queries. This allows more data to fit in memory at one time, and the more operations that can be performed in memory, the faster queries can execute.

In scenarios where it makes sense, I absolutely love clustered columnstore indexes.

Comments closed

Columnstore and Memory-Optimized tempdb

Erik Darling has a bucket of cold water for us:

In SQL Server 2019:

– Exciting stuff: In memory tempdb!
– Exciting stuff: sp_estimate_data_compression_savings can evaluate columnstore compression!
– Disappointing stuff: If you use in memory tempdb, you can’t have any columnstore anything in tempdb

That means if you’re using sneaky tricks like clustered columnstore indexes on temp tables to induce batch mode, you’re gonna get a lot of errors.

Likewise, you won’t be able to evaluate if columnstore will help your tables.

Click through to understand the extent of this limitation. Hopefully this is something we see addressed in vNext and a CU for 2019.

Comments closed

SQL Server 2019 and Columnstore Cleanup Issues

Taryn Pratt shares an issue with columnstore cleanup in SQL Server 2019:

The output of sp_who2 was repeatedly showing GHOST CLEANUP and CREATE INDEX. Over and over and over again. To be clear, I’m not a clustered columnstore expert, I know enough to be able to maintain them as needed. I went to Twitter and mentioned what I was seeing. I was advised by @sqL_handLe to try trace flag 661 which disables the ghost record removal process, and by Joe Obbish via Erik Darling to enable trace flag 634 to disable the tuple mover background task.

Initially, we enabled trace flag 634, but the logs continued to grow. We disabled trace flag 634. Then we enabled trace flag 661, and the logs continued to grow, so we disabled it. Finally, we tried enabling both of the trace flags. The big jumps stopped, but we now had about 400GB of logs that needed to be written to the reporting cluster before we could perform the failover.

While the logs were exploding we wondered if whatever was happening might have been caused by the deletions we did in early February. But why would they be triggered by the upgrade to SQL Server 2019?

Read the whole thing if you’re looking at a migration to 2019.

Comments closed

Rebuilding Rowstore Indexes Online on Tables with Columnstore Indexes

Niko Neugebauer explains something about rebuilding rowstore indexes:

This blogpost will be about older SQL Server versions (2016, 2017) and some implications that I have found that people rather do not understand, until they hit the problems in productions – the ONLINE creation & rebuild operations for the Rowstore indexes when having Columnstore Indexes on their tables.
As you should know by now – SQL Server 2017 & SQL Server 2019 respectively brought the ONLINE features support for the Columnstore Indexes (Nonclustered(2017) & Clustered(2019)) and I have blogged about those news in Columnstore Indexes – part 96 (“Nonclustered Columnstore Index Online Rebuild”) in 2017 and in Columnstore Indexes – part 123 (“Clustered Columnstore Index Online Rebuild”) in 2018.

What I honestly did not expect is the amount of the same question I have faced as in the last 3 months regarding the rebuild operations for the the ONLINE creation & rebuild operations for the Rowstore indexes when having Columnstore Indexes.

Read on to see the demonstration for when you have a clustered columnstore index and a nonclustered columnstore index.

Comments closed

Partitioning on Columnstore Table Loading

Aaron Bertrand continues a series around learning about columnstore indexes:

In part 1, I showed how both page and columnstore compression could reduce the size of a 1TB table by 80% or more. While I was impressed I could shrink a table from 1TB to 50GB, I wasn’t very happy with the amount of time it took (anywhere from 2 to 14 hours). With some tips graciously borrowed from folks like Joe ObbishLonny NiederstadtNiko Neugebauer, and others, in this post I will try to make some changes to my original attempt to get better load performance. Since the regular columnstore index didn’t compress better than page compression on this data set, and took 13 hours longer to get there, I’ll focus solely on the more advanced solution using COLUMNSTORE_ARCHIVE compression.

Click through for part 2.

Comments closed