I need to find a relatively efficient way to advance the CSILOCATOR because I need to do it over 2 billion times, if my theory is correct about the maximum allowed value. Both updating all of the rows in a delta rowgroup and deleting and reinserting advance the CSILOCATOR. I expected that small batch sizes would work best, and they did. For my table’s schema, the sweet spot for updates is about 275 rows and the sweet spot for delete/inserts is about 550 rows. Delete/inserts appeared to be faster than updates for the purpose of constantly reloading the same rows over and over.
Great post, Brent!
The first error above would be seen if you have a SQL Server Agent job that updates statistics. The second error is how it looks in an SSMS Query window.
The error message claims that
UPDATE STATISTICScan only be used on a columnstore index with the
STATS_STREAMoption. However, the Microsoft Docs UPDATE STATISTICS shows very “thin” documentation for the option, showing only these two tidbits:
Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.
Columnstore indexes really don’t want their stats updated, apparently, and will fight you tooth and nail to prevent it.
Why should we care about memory grant timeouts for CCI insert queries? Simply put, lots of bad things can happen when those queries can time out, both for serial and for parallel inserts. For serial insert queries, I’ve observed deadlocks, extremely poor performance along with long SLEEP_TASK waits, and extremely long rollbacks. For parallel insert queries, I’ve observed queries that run seemingly forever, poor performance of the SELECT part, and error 8645. You probably don’t want any of that occurring in production. It would be very helpful if it was possible to extend the 25 second time-out for queries that insert into columnstore tables.
Read through as Joe learns the true meaning of
Christmas a KB article.
Consider a query execution that meets all of the following criteria:
1. A parallel INSERT INTO… SELECT into a columnstore table is performed
2. The SELECT part of the query contains a batch mode hash join
3. The query can’t immediate get a memory grant, hits the 25 second memory grant timeout and executes with required memory
The query may appear to get stuck.
Click through for Joe’s demo. The fix? Update to SQL Server 2017 CU11.
Error due to online statement
It’s a very common error caused usually by somebody copying a rowstore “Create Index” command. The “online=on” option you use with rowstore indexes does not work with creating columnstore indexes yet. For the record the online option will work in SQL Server 2019.
This one I’ve run into, as I like clustered columnstore indexes a lot but occasionally need single-row results from them. If the table is empty, creating an index offline is no problem. But once you get a billion or so rows in it, that’s a non-starter in a 24×7 uptime shop. Read the whole thing.
In the last year or so, with a large customer who makes fairly heavy use of this pattern, I’ve noticed another concern. Sometimes, and I can’t figure out what exactly triggers it, the execution plan generated, will do a seek against the nonclustered index and then do a key lookup against the columnstore as seen below. This is bad for two reasons–first the key lookup is super expensive, and generally columnstores are very large, secondly this key lookup is in row execution mode rather than batch and drops the rest of the execution plan into row mode, thus slowing the query down even further.
Joey also has a UserVoice item as well, so check it out.
I have a client that used Itzik Ben-Gan’s solution of creating a filtered nonclustered columnstore index to achieve batch mode on a rowstore (in fact I proposed that the client consider it). They have an OLTP system, and often perform YTD calculations. When they tested, processing time was reduced by 30 to 50 percent, without touching a single line of application code. If that ain’t low hanging fruit, I don’t know what is —
However, during testing, I noticed some intermittent blocking that didn’t make sense to me. But I couldn’t nail it down, and they went live with the “filtered nonclustered columnstore index” solution.
Once they deployed – and there was a lot of concurrency – I could see what had eluded me during my proof of concept: blocking in tempdb.
Read on for the repro and check out Ned’s UserVoice bug report.
I have a huge love for the DBCC CLONEDATABASE command – it has been made available (backported) to every SQL Server version starting with SQL Server 2012, since the original release in SQL Server 2014, while being constantly improved in the Service Packs and Cumulative Updates.
This blog post is focusing on the Database Cloning improvement in the SQL Server 2019 that is already available in the public CTP 2.0 – the possibility of the automated statistics extraction for the Columnstore Indexes.
Well, there was quite a significant problem with the Columnstore Indexes previously – the statistics for them were not extracted into the cloned database, unless you did created the statistics in the most recent step before Database cloning.
Click through for more details and a comparison between SQL Server versions.
The authors conducted a series of microbenchmarks as follows:
scans with single predicates with varying selectivity to study the trade-off between the range scan of a B+ tree vs a columnstore scan
sort and group-by queries to study the benefit of the sort order supported by B+ trees (columnstores in SQL Server are not sorted).
update statements with varying numbers of updated rows to analyze the cost of updating the different index types
mixed workloads with different combinations of reads and updates
It’s interesting to read an academic paper covering the topic, particularly when you can confirm that it works well in practice too.
The final results were pretty much amazing – 2.1 GB, making the estimation of the sys.sp_estimate_data_compression_savings System Stored Procedure much more precise then my own function! This leaves me very happy and makes me want to investigate and learn how this new stored procedure is capable of providing better estimations.
I decided to test on the other tables within TPCH database and my test on the Orders table have shown a different situation where the 0.7 GB estimation of the sys.sp_estimate_data_compression_savings Stored Procedure were pretty much offbeat when comparing to the CISL dbo.cstore_sp_estimate_columnstore_compression_savings – showing 0.92 GB while the end result was 0.89 GB.
I guess the final answer is that it will depend, but that the estimation of the new stored procedure are not totally offbeat is an incredibly good sign, though I would still keep an eye or even two on the provided estimated results.
Read the whole thing for more details on these examples.