Press "Enter" to skip to content

Category: Columnstore

Locking Issue with Columnstore Indexes

Joe Obbish troubleshoots an issue on tables with columnstore indexes:

I recently ran into a production issue where a SELECT query that referenced a NOLOCK-hinted table was hitting a 30 second query timeout. Query store wait stats suggested that the issue was blocking on a table with a nonclustered columnstore index (NCCI). This was quite unexpected to me and I was eventually able to produce a reproduction of the issue. I believe this to be a bug in SQL Server that’s present in both RTM and the current CU as of this blog post (CU14). The issue also impacts CCIs as well but I did significantly less testing with that index type.

Read on for the issue, how you can replicate it, and a couple ways to work around it.

Comments closed

Handling Tombstones in Cassandra

Payal Kumari takes us through tombstone management in Apache Cassandra:

Got too many tombstones? This blog post will talk about how to deal with tombstones once you already have them. For more information about tombstones, check out this post: Examining the Lifecycle of Tombstones in Apache Cassandra.

Click through for several techniques for handling tombstoned records in Cassandra. In SQL Server, with columnstore indexes, the prevention advice is similar (avoid deletion or updating of data) but the treatment options are quite different.

Comments closed

Persistent Computed Columns and Columnstore Indexes

Erik Darling found a way to do something interesting:

If you read the documentation for column store indexes, it says that column store indexes can’t be created on persisted computed columns.

And that’s true. If we step through this script, creating the column store index will fail.

But it turns out that if there’s a will, there’s a way. Even if this is something you shouldn’t wish to do because who knows what it will mess up.

Comments closed

A Primer on Columnstore Indexes

Gail Shaw gives us an introduction to columnstore indexes:

Columnstores are… different.

The first, and I would say most important thing to realise about columnstore indexes is that they don’t have keys. These are not seekable indexes. Any access to a columnstore index is going to be a scan.

Instead of storing the rows together on a page, a columnstore index instead stores column values together. The rows in the table are divided into chunks of max a million rows, called a row group, and the columns are then stored separately, in what are called segments. A segment will only ever contain one column’s values.

Read the whole thing.

Comments closed

Columnstore, Strings, and Windowing Functions

Erik Darling has a tale to tell:

The only columns that we were really selecting from the Comments table were UserId and CreationDate, which are an integer and a datetime.

Those are relatively easy columns to deal with, both from the perspective of reading and sorting.

In order to show you how column selection can muck things up, we need to create a more appropriate column store index, add columns to the select list, and use a where clause to  restrict the number of rows we’re sorting. Otherwise, we’ll get a 16GB memory grant for every query.

Read on to see how one little (or, well, big) string column can foul up the whole works.

Comments closed

Query Plans and Window Functions

Erik Darling has a two-fer here. First, window functions and parallelism:

When windowing functions don’t have a Partition By, the parallel zone ends much earlier on than it does with one.

That doesn’t mean it’s always slower, though. My general experience is the opposite, unless you have a good supporting index.

But “good supporting index” is for tomorrow. You’re just going to have to deal with that.

Second, columnstore behavior with respect to window functions:

Not only is the parallel version of the row mode plan a full second slower, but… look at that batch mode plan.

Look at it real close. There’s a sort before the Window Aggregate, despite reading from the same nonclustered index that the row mode plan uses.

But the row mode plan doesn’t have a Sort in it. Why?

Check out both posts.

Comments closed

Columnstore in Standard Edition

Erik Darling looks at how powerful (or not) columnstore indexes are in SQL Server Standard Edition:

The top plan is from Standard Edition, and runs for a minute in a full serial plan. There is a non-parallel plan reason in the operator properties: MaxDOPSetToOne.

I do not have DOP set to one anywhere, that’s just the restriction kicking in. You can try it out for yourself if you have Standard Edition sitting around somewhere. I’m doing all my testing on SQL Server 2019 CU9. This is not ancient technology at the time of writing.

The bottom plan is from Enterprise/Developer Edition, where the the plan is able to run partially in parallel, and takes 28 seconds (about half the time as the serial plan).

You get what you pay for in this case.

Comments closed

Columnstore Query Patterns

Ed Pollack walks us through some query patterns which do and don’t work very well with columnstore indexes:

Reading data from a highly compressed analytical structure is quite different from the query patterns used on transactional data. By leveraging metadata, data order, segment elimination, and compression, large tables can be quickly read and results returned in seconds (or less!).

Taking this further, read queries against columnstore indexes can be further optimized by simplifying queries and providing the query optimizer with the easiest path to the smallest columnstore scans needed to return results.

This article explores the most efficient ways to read a columnstore index and produce guidelines and best practices for analytics against large columnstore data structures.

Read on for good advice.

Comments closed

Testing Columnstore Data Loads on Eight-Socket Servers

Joe Obbish puts on the lab coat and safety goggles:

I elected to use a high concurrency CCI insert workload to compare performance between a four socket VM and an eight socket VM. Quite conveniently, I already had a test columnstore workload that I knew pushed the SQL Server scalability limits in terms of memory management. To perform the threading I used the SQL Server Multi Thread open source framework. I wanted all sessions to go to their own schedulers. That could have been tough to manage with tests up to 200 threads but the threading framework handles that automatically.

For those following along at home, testing was done with SQL Server 2019 with LPIM and TF 876 enabled. Guest VMs were built with VMware with Windows Server 2019 installed. The four and eight socket VMs were created on the same physical host with about 5.5 TB of RAM available to the guest OS in both configurations.

Read on to see how an eight-socket server fared in comparison to a four-socket server in this task.

Comments closed