Press "Enter" to skip to content

Category: Columnstore

Ordered Columnstore Indexes in SQL Server 2022

Ed Pollack gives us the scoop on ordered columnstore indexes:

One of the more challenging technical details of columnstore indexes that regularly gets attention is the need for data to be ordered to allow for segment elimination. In a non-clustered columnstore index, data order is automatically applied based on the order of the underlying rowstore data. In a clustered columnstore index, though, data order is not enforced by any SQL Server process. This leaves managing data order to us, which may or may not be an easy task.

To assist with this challenge, SQL Server 2022 has added the ability to specify an ORDER clause when creating or rebuilding an index. This feature allows data to be automatically sorted by SQL Server as part of those insert or rebuild processes. This article dives into this feature, exploring both its usage and its limitations.

I’ve seen a couple places where ordered columnstore indexes make enough sense to use, though not as many as I had first anticipated. That might change over time, as we see additional columnstore development.

Comments closed

Finding Columnstore Index Storage and Memory Allocations

Jose Manuel Jurado Diaz has a few scripts for us:

Today, we got a new question how much is the size used by a columnstore index table at storage level and memory usage.

TSQL to obtain the total number of rows, size per schema, table and index.

Using the view sys.column_store_row_groups (Transact-SQL) – SQL Server | Microsoft Learn we could see the total number of rows and space usage at storage level.

Click through for that script, as well as a few more to learn how much space and memory that columnstore index is taking.

Comments closed

Bringing Order to a Columnstore Index

Tibor Karaszi puts columnstore ducks in a row:

Data for a columnstore index is divded in groups of approximate 1 million rows, rowgroups. Each rowgroup has a set of pages for each column. The set of pages for a column in a rowgroup is called a segment. SQL Server has meta-data for the lowest and highest value for a segment. There are no SEEKs in a columnstore index. But, SQL Server can use this meta-data to skip reading segments, with the knowledge that “this segment cannot contain any data that I need based on my predicates in my WHERE clause”.

Also, you might want to do these operations using MAXDOP 1, so we don’t have several threads muddling our neat segment alignment.

I’m not sure I actually set the ORDER BY clause on columnstore indexes all that often—a quick mental survey says maybe once, though that could be my own failing rather than a statement on the utility of ordered columnstore indexes.

Comments closed

Ordered Columnstore Indexes

Joe Obbish and Erik Darling tag team on this one. First, Joe looks at the details of what the CCI ordering process does:

The sort for inserting into an ordered columnstore is a DML request sort. It appears to use the same internal mechanism as the sort that’s added for inserting into partitioned columnstore tables. The difference is that the data is sorted by the specified columns instead of a calculated partition id. In my testing, the sort appears to be a best effort sort that does not spill to tempdb. This means that if SQL Server thinks there won’t be enough memory then the data will not be fully sorted. Parallel inserts have an additional complication. 

And Erik has a messy work-around:

Anyway, I decided to dig in and see what was going on behind the scenes. Which of course, means query plans, and bothering people who are really good at debuggers.

Most of the problems that you’ll run into in SQL Server will come from sorting data.

Whenever I have to think about Sorts, I head to this post about all the different Sorts you might see in a query plan.

Definitely read both posts.

Comments closed

Ordered Columnstore Indexes in SQL Server 2022

Brent Ozar appreciates order:

So essentially, every column has a whole bunch of indexes on it.

But there’s no order whatsoever as to which rows end up in which index.

This isn’t a problem for relatively small tables, but as you get to billion-row data warehouse fact tables where columnstore should really shine, performance gradually degrades. In data warehouses, fact tables often have a commonly filtered column, like SaleDate. However, until SQL Server 2022, even if you wanted a small SaleDate range, your query would likely check hundreds or thousands of row groups, each of which had a huge range of data.

But do read the whole thing, as it seems it’s not working correctly in CTP 2.0 of SQL Server 2022. It is quite useful in Azure Synapse Analytics dedicated SQL pools, at least—that I can confirm.

Comments closed

Diving into Vertipaq Compression

Ed Pollack explains how Vertipaq compression works to make columnstore indexes so efficient:

Columnstore compression is an impressive array of algorithms that can take large analytic tables and significantly reduce their storage footprint. In doing so, IO is also reduced, and query performance dramatically improved.

This article dives into one aspect of columnstore compression that tends to get buried in all of the hoopla surrounding how awesome columnstore indexes are: Vertipaq optimization. This is a critical component of the columnstore compression process, and understanding how it works can significantly improve the performance of analytic workloads while reducing the computing resources required for the underlying data.

Click through for the steps of the process.

Comments closed

Serializing Clustered Columnstore Index Deletes

Aaron Bertrand hit a wall:

At Stack Overflow, we have some tables using clustered columnstore indexes, and these work great for the majority of our workload. But we recently came across a situation where “perfect storms” — multiple processes all trying to delete from the same CCI — would overwhelm the CPU as they all went widely parallel and fought to complete their operation. Here’s what it looked like in SolarWinds® SQL Sentry®:

It looks bad. Click through to understand why and what Aaron & co did to prevent this issue. I typically have used queue tables on the other end of columnstore indexes: as a method for ensuring that we insert 1024*1024 rows at a time. This was particularly important in the 2016 days, as we had a problem in which trickle-loading a columnstore index would cause massive numbers of rowgroups with dozens of rows, though that issue was subsequently fixed.

Comments closed

Locking Issue with Columnstore Indexes

Joe Obbish troubleshoots an issue on tables with columnstore indexes:

I recently ran into a production issue where a SELECT query that referenced a NOLOCK-hinted table was hitting a 30 second query timeout. Query store wait stats suggested that the issue was blocking on a table with a nonclustered columnstore index (NCCI). This was quite unexpected to me and I was eventually able to produce a reproduction of the issue. I believe this to be a bug in SQL Server that’s present in both RTM and the current CU as of this blog post (CU14). The issue also impacts CCIs as well but I did significantly less testing with that index type.

Read on for the issue, how you can replicate it, and a couple ways to work around it.

Comments closed

Handling Tombstones in Cassandra

Payal Kumari takes us through tombstone management in Apache Cassandra:

Got too many tombstones? This blog post will talk about how to deal with tombstones once you already have them. For more information about tombstones, check out this post: Examining the Lifecycle of Tombstones in Apache Cassandra.

Click through for several techniques for handling tombstoned records in Cassandra. In SQL Server, with columnstore indexes, the prevention advice is similar (avoid deletion or updating of data) but the treatment options are quite different.

Comments closed