Columnstore Elimination

Sunil Agarwal has a two-part series on columnstore data elimination.  First up is column elimination:

Now, let us run the same query on the table with clustered columnstore index as shown in the picture below. Note, that the logical IOs for the LOB data is reduced by 3/4th for the second query as only one column needs to be fetched. You may wonder why LOB? Well, the data in each column is compressed and then is stored as BLOB. Another point to note is that the query with columnstore index runs much faster, 25x for the first query and 4x for the second query.

Next up is rowgroup elimination:

In the context of rowgroup elimination, let us revisit the previous example with sales data

  • You may not even need partitioning to filter the rows for the current quarter as rows are inserted in the SalesDate order allowing SQL Server to pick the rowgroups that contain the rows for the requested date range.
  • If you need to filter the data for a specific region within a quarter, you can partition the columnstore index at quarterly boundary and then load the data into each partition after sorting on the region. If the incoming data is not sorted on region, you can follow the steps (a) switch out the partition into a staging table T1 (b) drop the clustered columnstore index (CCI) on the T1 and create clustered btree index on T1 on column ‘region’ to order the data (c) now create the CCI while dropping the existing clustered index. A general recommendation is to create CCI with DOP=1 to keep the prefect ordering.

From these two articles, queries which hit a small percentage of columns and stick to a relatively small number of rowgroups will likely perform better.  For people who understand normal B-tree indexes, the second point seems clear enough, but the first point is at least as important.

Related Posts

Iterative Solutions To The Closest Match Problem

Itzik Ben-Gan has a follow-up article looking at row-by-row solutions to the closest match problem: Last month, I covered a puzzle involving matching each row from one table with the closest match from another table. I got this puzzle from Karen Ly, a Jr. Fixed Income Analyst at RBC. I covered two main relational solutions that […]

Read More

Speeding Up The First Responder Power BI Interface

Kellyn Pot’vin-Gorman hits the Go Faster button: The gist of this kit is that it is a database repository as part of the sp_BlitzFirst to collect monitoring alerting and performance metric data. Once you’ve set this up, then you can use a Power BI desktop dashboard as an interface for all that data.Now this is an awesome […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031