Columnstore Elimination

Sunil Agarwal has a two-part series on columnstore data elimination.  First up is column elimination:

Now, let us run the same query on the table with clustered columnstore index as shown in the picture below. Note, that the logical IOs for the LOB data is reduced by 3/4th for the second query as only one column needs to be fetched. You may wonder why LOB? Well, the data in each column is compressed and then is stored as BLOB. Another point to note is that the query with columnstore index runs much faster, 25x for the first query and 4x for the second query.

Next up is rowgroup elimination:

In the context of rowgroup elimination, let us revisit the previous example with sales data

  • You may not even need partitioning to filter the rows for the current quarter as rows are inserted in the SalesDate order allowing SQL Server to pick the rowgroups that contain the rows for the requested date range.
  • If you need to filter the data for a specific region within a quarter, you can partition the columnstore index at quarterly boundary and then load the data into each partition after sorting on the region. If the incoming data is not sorted on region, you can follow the steps (a) switch out the partition into a staging table T1 (b) drop the clustered columnstore index (CCI) on the T1 and create clustered btree index on T1 on column ‘region’ to order the data (c) now create the CCI while dropping the existing clustered index. A general recommendation is to create CCI with DOP=1 to keep the prefect ordering.

From these two articles, queries which hit a small percentage of columns and stick to a relatively small number of rowgroups will likely perform better.  For people who understand normal B-tree indexes, the second point seems clear enough, but the first point is at least as important.

Related Posts

Causing Error 666 When Loading Into Columnstore Index

Joe Obbish has moved into Erik Darling’s Internet Basement and has a doozy of a first post there: I need to find a relatively efficient way to advance the CSILOCATOR because I need to do it over 2 billion times, if my theory is correct about the maximum allowed value. Both updating all of the […]

Read More

Blocked Queries and the Plan Cache

Erik Darling lays out an argument why execution time isn’t always a great measure for finding bad queries: Most scripts (even ones I’ve worked on!) that look at the plan cache, have had the ability to sort it by different metrics: CPU, reads, writes, duration, etc. A lot of people are very interested in long […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031