Looking at the execution plan, SQL Server decided to scan the non-clustered columnstore index, even though it doesn’t contain the computed column BirthYear! This surprised me, because I have a plain old non-clustered index on BirthYear which covers the query as well. I guess the optimizer is really excited about that nonclustered columnstore.
Kendra links to a Connect item from Niko Neugebauer to add persisted computed columns to columnstore indexes.
This adjustment might take place based of the inedequacy of the estimated number of rows (which is based on the statistics available at the time of the execution time generation) related to the real execution number of rows that the iterator/query is processing.
The 2 possible adjustment scenarios are:
– when estimated number of rows is too high and the memory is granted to the query, even though the query itself will not use it.
– when estimated number of rows is too low and the memory operations such as hashing or sorting will not have enough space to fit the complete data sets, thus making them spill on to the TempDB (temporary storing the data while doing the work, based on the lacking of the available memory to the query).
Read on for details, but one interesting caveat is that this doesn’t change anything for the first run; it only updates requests on subsequent runs, so it benefits most from consistent workloads with significant plan re-use. That said, it looks extremely useful.
Turns out the majority of the rows belonged to the second scenario. Whoops. The initial run took a little over 20 hours. Not exactly rocket speed. The problem was that for each period, a large number of rows in the clustered columnstore index (CCI) had to be updated, just to set the range of the interval. Updates in a CCI are expensive, as they are split into inserts and deletes. Doing so many updates resulted in a heavily fragmented CCI and with possibly too many rows in the delta storage (which is row storage).
Read the whole thing. Koen links to a Niko Neugebauer post, which you should also read. After that, read my warning on trickle loading. The major querying benefits you get from clustered columnstore indexes is great, but it does come at a cost when you aren’t simply inserting new rows.
In the upcoming version of SQL Server (for the moment known as SQL Server vNext), Microsoft has finally announced the upcoming support for the LOBs within Columnstore Indexes – thus enabling the usage of the NVARCHAR(MAX), VARCHAR(MAX), and VARBINARY(MAX) data types on the tables with Columnstore Indexes that include those columns.
For the tests, I have decided to spin a Virtual Machine in Azure with an installation of the currently available CTP1 of the SQL Server vNext, which has a version 220.127.116.11.
Read the whole thing. It’s hard to tell at this point if these are bugs, incomplete functionality, or what, so it’ll be interesting to track changes over the CTPs.
Given the improvements and the availability of the of the programability surface for every edition (with some insignificant & logical limitations) that I have blogged about in
SQL Server 2016 SP1 – Programmability Surface for everyone!, I believe everyone using Microsoft Data Platform has rejoyced greatly. Of course, now everyone can have Columnstore Indexes on every SQL Server edition!
There are some noticeable limitations that were announced right from the start, such as the maximum size of the Columnstore Object Pool (you can find more information about it here – Columnstore Indexes – part 38 (“Memory Structures”)), but there are more limitations to the Standard Editions and inferior ones and it is extremely important to know them, to understand them in order to make the right decision – when your Business is ready/needed to upgrade to the Enterprise Edition of the SQL Server.
If you’re on Standard Edition and excited about using Columnstore, do read Niko’s post. Columnstore won’t work as fast as it does on Enterprise Edition (gotta have a reason to upgrade) but based on what he’s shown thus far, Columnstore is still a good reason to upgrade to 2016 SP1 if you’re on Standard Edition.
With an incredible Service Pack 1 Microsoft has triumphantly announced that all editions (Standard, Web, Express and even Local) will get the most advanced programming capabilities of Columnstore, In-Memory, Database Snapshot, Compression, Partition & many others, plus that there are some incredible features for the T-SQL (CREATE OR ALTER) and Execution Plan details (Actual Information on processed Rows, used Trace Flags, etc),
but at the same time there are some quite important improvement under the hood that will make you want to use Service Pack 1 for SQL Server 2016 immediately. One of this features is the fast addition of the Columnstore Index to the Memory-Optimised tables. Let take it to the test by restoring a copy of the ContosoRetailDW free database:
These results look nice.
In this design, I have data coming from the transactional system, undergoing some ETL processing, and going into a staging table on the warehouse. From there, I perform the remainder of the ETL work and insert into a rowstore table. This rowstore table has the same attribute names and data types as the columnstore table, but instead of having a clustered columnstore index, it has a standard B-tree index and can have additional non-clustered indexes. From there, I expose the combination table using a view which simply unions the two sets of data so the application doesn’t have to see rowstore versus columnstore tables.
If you’re doing nightly insertions, results tend to be a lot better. But if you need close-to-but-not-quite real-time data, there are still ways to solve the problem.
These two filtered indexes are very different – and the SQL Server optimizer can use them very differently!
While classic filtered nonclustered rowstore indexes must reliably “cover” parts of the query to be used to the optimizer, filtered nonclustered columnstore indexes may be combined with other indexes to produce a plan returning a larger range of data.
This sounds a little weird. I’ll show you what I mean using the WideWorldImporters database.
Kendra exposes an interesting difference in the two types of index and a case where filtered indexes simply fail (though that’s not a situation you want to be in anyhow!).
Another happy release of the CISL (Columnstore Indexes Script Library) is live – this time it is 1.4.0!
This release is focusing on the addition of the Extended Events, so that a user of CISL can easily set up the events for each of the SQL Server (2012,2014,2016) or Azure SQL Database versions.
This is an open source library which I recommend if you deal with columnstore indexes in any fashion.
I would like to dedicate this blog post to the Memory-Optimised (also known and LOVED as Hekaton) Columnstore Indexes and their limitations in SQL Server 2016.
Disclaimer: the Memory-Optimised Technology is the ground-breaking development, which will be truly appreciated only in the next couple of years, and it has its incredible use cases (and maybe I will be blogging more about this space in the next couple of months), but people needs to understand that mapping InMemory Columnstore Indexes to disk-based Columnstore Indexes 1:1 is a very wrong idea, and that because InMemory technology is significantly younger and less stable than Columnstore Indexes – there are some very significant hidden cornerstones.
It’s important to read this post as “this is not yet a fully-mature product” rather than “this will always be worse.” But it’s just as important to understand the limitations of the product and not think you’re getting something that you aren’t.