I mentioned in the introduction that not all eight-byte data types can fit in 64 bits. This fact is important because many columnstore and batch mode performance optimizations only work with data 64 bits in size. Aggregate pushdown is one of those things. There are many more performance features (not all documented) that work best (or at all) only when the data fits in 64 bits.
In our specific example, aggregate pushdown is disabled for a columnstore segment when it contains even one data value that does not fit in 64 bits. SQL Server can determine this from the minimum and maximum value metadata associated with each segment without checking all the data. Each segment is evaluated separately.
Paul goes deep into the concept, making this well worth your while.
What tier do you need to create one of these things? Let’s see.
CLUSTERED COLUMNSTORE INDEX
( ONLINE = ON
But I get this message, Msg 40536, Level 16, State 32, Line 1
‘COLUMNSTORE’ is not supported in this service tier of the database. See Books Online for more details on feature support in different service tiers of Windows Azure SQL Database.
Read on to see the minimum tier which allows online creation of clustered columnstore indexes.
Using AdventureworksDW2016CTP3 we will work with the FactResellerSalesXL table which has 11.6 million rows in it. The simple query we will use as a demo just selects the ProductKey and returns some aggregations grouping them by the different product keys.
First, we will run the query with no existing columnstore index and only using the current clustered rowstore (normal) index. Note that I turned on SET STATISTICS IO and TIME on. These two SET statements will help us better illustrate the improvements provided by the columnstore index. SET STATISTICS IO displays statistics on the amount of page activity generated by the query. It gives you important details such as page logical reads, physical reads, scans, and lob reads both physical and logical. SET STATISTICS TIME displays the amount of time needed to parse, compile, and execute each statement in the query. The output shows the time in milliseconds for each operation to complete. This allows you to really see, in numbers, the differences.
Click through for the example.
Now, I admit when these first were introduced in SQL Server 2012 I found them very intimidating (additionally, you couldn’t update them directly). For me, anytime you say columnstore, my mind tends to set off alarms saying wait stay away, this is too complicated. So, in this post I am going to try and simplify the feature for you.
To do that first you need to understand some terminology and the difference between a columnstore index and a row store index (the normal kind we all use). Let’s start with the terminology.
There are some interesting complications around columnstore indexes but for analytical or warehousing queries, they’re excellent.
The Columnstore Index Scan is not really an actual operator. You can encounter it in graphical execution plans in SSMS (and other tools), but if you look at the underlying XML of the execution plan, you will see that it is either an Index Scan or a Clustered Index Scan operator.
SQL Server currently supports three types of index storage: rowstore, columnstore, and memory-optimized. Indexes of each of those types can be the target of an Index Scan or Clustered Index Scan, as indicated by the Storage property. When the Storage property is RowStore or MemoryOptimized, then the normal icon for (clustered) index scan is use, but when Storage is ColumnStore than SSMS (and other tools) choose to show a different icon instead.
Click through for more details.
I need to find a relatively efficient way to advance the CSILOCATOR because I need to do it over 2 billion times, if my theory is correct about the maximum allowed value. Both updating all of the rows in a delta rowgroup and deleting and reinserting advance the CSILOCATOR. I expected that small batch sizes would work best, and they did. For my table’s schema, the sweet spot for updates is about 275 rows and the sweet spot for delete/inserts is about 550 rows. Delete/inserts appeared to be faster than updates for the purpose of constantly reloading the same rows over and over.
Great post, Brent!
The first error above would be seen if you have a SQL Server Agent job that updates statistics. The second error is how it looks in an SSMS Query window.
The error message claims that
UPDATE STATISTICScan only be used on a columnstore index with the
STATS_STREAMoption. However, the Microsoft Docs UPDATE STATISTICS shows very “thin” documentation for the option, showing only these two tidbits:
Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.
Columnstore indexes really don’t want their stats updated, apparently, and will fight you tooth and nail to prevent it.
Why should we care about memory grant timeouts for CCI insert queries? Simply put, lots of bad things can happen when those queries can time out, both for serial and for parallel inserts. For serial insert queries, I’ve observed deadlocks, extremely poor performance along with long SLEEP_TASK waits, and extremely long rollbacks. For parallel insert queries, I’ve observed queries that run seemingly forever, poor performance of the SELECT part, and error 8645. You probably don’t want any of that occurring in production. It would be very helpful if it was possible to extend the 25 second time-out for queries that insert into columnstore tables.
Read through as Joe learns the true meaning of
Christmas a KB article.
Consider a query execution that meets all of the following criteria:
1. A parallel INSERT INTO… SELECT into a columnstore table is performed
2. The SELECT part of the query contains a batch mode hash join
3. The query can’t immediate get a memory grant, hits the 25 second memory grant timeout and executes with required memory
The query may appear to get stuck.
Click through for Joe’s demo. The fix? Update to SQL Server 2017 CU11.
Error due to online statement
It’s a very common error caused usually by somebody copying a rowstore “Create Index” command. The “online=on” option you use with rowstore indexes does not work with creating columnstore indexes yet. For the record the online option will work in SQL Server 2019.
This one I’ve run into, as I like clustered columnstore indexes a lot but occasionally need single-row results from them. If the table is empty, creating an index offline is no problem. But once you get a billion or so rows in it, that’s a non-starter in a 24×7 uptime shop. Read the whole thing.