Columnstore Elimination

Sunil Agarwal has a two-part series on columnstore data elimination.  First up is column elimination:

Now, let us run the same query on the table with clustered columnstore index as shown in the picture below. Note, that the logical IOs for the LOB data is reduced by 3/4th for the second query as only one column needs to be fetched. You may wonder why LOB? Well, the data in each column is compressed and then is stored as BLOB. Another point to note is that the query with columnstore index runs much faster, 25x for the first query and 4x for the second query.

Next up is rowgroup elimination:

In the context of rowgroup elimination, let us revisit the previous example with sales data

  • You may not even need partitioning to filter the rows for the current quarter as rows are inserted in the SalesDate order allowing SQL Server to pick the rowgroups that contain the rows for the requested date range.
  • If you need to filter the data for a specific region within a quarter, you can partition the columnstore index at quarterly boundary and then load the data into each partition after sorting on the region. If the incoming data is not sorted on region, you can follow the steps (a) switch out the partition into a staging table T1 (b) drop the clustered columnstore index (CCI) on the T1 and create clustered btree index on T1 on column ‘region’ to order the data (c) now create the CCI while dropping the existing clustered index. A general recommendation is to create CCI with DOP=1 to keep the prefect ordering.

From these two articles, queries which hit a small percentage of columns and stick to a relatively small number of rowgroups will likely perform better.  For people who understand normal B-tree indexes, the second point seems clear enough, but the first point is at least as important.

Related Posts

Nested Loops, Hash, Or Merge: Which Is Best?

Grant Fritchey dodges the important questions: First response, also a joke, was the question at the title of this post: What is the preferred operator when joining tables: Hash Match, Nested Loops or Merge? While my immediate response to this question is, yes. Meaning, they’re all preferred, situationally. I decided to expand on that a […]

Read More

Parallelism Strategies For Grouping Operations

Itzik Ben-Gan continues his series on grouping data in SQL Server by looking at how these operations can go parallel: Besides needing to choose between various grouping and aggregation strategies (preordered Stream Aggregate, Sort + Stream Aggregate, Hash Aggregate), SQL Server also needs to choose whether to go with a serial or a parallel plan. […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031