Indexing – Page 12 – Curated SQL

Ordered Columnstore Indexes

Published 2022-07-14 by Kevin Feasel

Joe Obbish and Erik Darling tag team on this one. First, Joe looks at the details of what the CCI ordering process does:

The sort for inserting into an ordered columnstore is a DML request sort. It appears to use the same internal mechanism as the sort that’s added for inserting into partitioned columnstore tables. The difference is that the data is sorted by the specified columns instead of a calculated partition id. In my testing, the sort appears to be a best effort sort that does not spill to tempdb. This means that if SQL Server thinks there won’t be enough memory then the data will not be fully sorted. Parallel inserts have an additional complication.

And Erik has a messy work-around:

Anyway, I decided to dig in and see what was going on behind the scenes. Which of course, means query plans, and bothering people who are really good at debuggers.
Most of the problems that you’ll run into in SQL Server will come from sorting data.
Whenever I have to think about Sorts, I head to this post about all the different Sorts you might see in a query plan.

Definitely read both posts.

Comments closed

Indexes: Gotta Add ‘Em All

Published 2022-07-14 by Kevin Feasel

Andy Yun tells a tale of woe:

I once worked for an Entity Framework SaaS company that was having horrific T-SQL performance issues. We put out a major release one weekend and spent the subsequent week buried in horrible performance hell. Because all T-SQL code was generated from Entity Framework (stored procedures were banned by a development exec prior to me joining the company), there were no easy fixes that could be implemented (and a Production release typically took 12 – 36 hours, I wish I was joking).
The manager of infrastructure had heard about SQL Server’s missing index recommendations:

Yep, it ends about as poorly as you’d expect.

Comments closed

Ordered Columnstore Indexes in SQL Server 2022

Published 2022-07-07 by Kevin Feasel

Brent Ozar appreciates order:

So essentially, every column has a whole bunch of indexes on it.
But there’s no order whatsoever as to which rows end up in which index.
This isn’t a problem for relatively small tables, but as you get to billion-row data warehouse fact tables where columnstore should really shine, performance gradually degrades. In data warehouses, fact tables often have a commonly filtered column, like SaleDate. However, until SQL Server 2022, even if you wanted a small SaleDate range, your query would likely check hundreds or thousands of row groups, each of which had a huge range of data.

But do read the whole thing, as it seems it’s not working correctly in CTP 2.0 of SQL Server 2022. It is quite useful in Azure Synapse Analytics dedicated SQL pools, at least—that I can confirm.

Comments closed

Multidimensional Bloom Filters

Published 2022-06-17 by Kevin Feasel

The Instaclustr team talks bloom filters:

Bloom filters are space-efficient probabilistic data structures that can yield false positives but not false negatives. They were initially described by Burton Bloom in his 1970 paper “Space/Time Trade-offs in Hash Coding with Allowable Errors“. They are used in many modern systems including the internals of the Apache® projects Cassandra®, Spark™, Hadoop®, Accumulo®, ORC™, and Kudu™.
Multidimensional Bloom filters are data structures to search collections of Bloom filters for matches. The simplest implementation of a Multidimensional Bloom filter is a simple list that is iterated over when searching for matches. For small collections (n < 1000) this is the most efficient solution. However, when working with collections at scale other solutions can be more efficient.

Read on to learn more, including some discussion about an implementation in Cassandra.

Comments closed

Finding Queries with Missing Index Requests in SQL Server 2019

Published 2022-06-06 by Kevin Feasel

Erik Darling shows off a nicety in SQL Server 2019 and later:

Note that this script does not assemble the missing index definition for you. That stuff is all readily available in the query plans that get returned here, and of course the missing index feature has many caveats and limitations to it.
You should, as often as possible, execute the query and collect the actual execution plan to see where the time is spent before adding anything in.

Read on for the script.

Comments closed

Understanding Missing Index Impact

Published 2022-06-03 by Kevin Feasel

Erik Darling delves into the depths of missing indexes:

Breaking each of those down, the only one that has a concrete meaning is Uses, but that of course doesn’t mean that a query took a long time or is even terribly inefficient.
That leaves us with Average Query Cost, which is the sum of each operator’s estimated cost in the query plan, and Impact.
But where does Impact come from?

Read on to learn where, as well as why you shouldn’t blindly trust that number.

Comments closed

Finding Indexing Metrics in Cosmos DB

Published 2022-05-20 by Kevin Feasel

Hasan Savran looks at the numbers:

You might need Composite Indexes to make your queries more efficient, Cosmos DB does not create any Composite Indexes for you. You need to figure out which properties should have composite indexes then you need to change the indexing policy file to create them.
Indexing Metrics comes to your help when you need help with indexing policy. It tells you which indexes the current query uses and it gives you hints about what other indexes you should create to make the query work faster/cheaper. Like many other features of Cosmos DB, you need to write code by using SDK to see Indexing Metrics. The following example shows how to enable Indexing Metrics for your queries.

Click through for a code sample which shows how to collect index metrics.

Comments closed

An Index for Change Tracking Cleanup

Published 2022-05-20 by Kevin Feasel

James Ferebee creates an index:

If the issue persists and we stack a lot of data that needs to be removed which can be too much for autocleanup to manage. At that point, we recommend running (as is indicated in the error message) sp_flush_CT_internal_table_on_demand which is discusseed in detailed in Amit’s blog post Change Tracking Cleanup–Part 1.
I will not discuss the internals of the process as Amit Banerjee already touched on this in the referenced blog. However if you are routinely encountering issues where autocleanup can’t keep up and/or manual cleanup is taking significant time, you can add the index and see if it helps and I have the process enumerated below. Keep in mind this is not guaranteed to fix all cleanup issues and it may still be necessary to run manual cleanup regularly. If you continue to have issues with cleanup feel free to create a ticket with us here at CSS to assist you and get specific data to your environment.

Read on for the index definition as well as some important notes about whether you might need it.

Comments closed

Defining Index Seeks Down

Published 2022-05-05 by Kevin Feasel

Brent Ozar explains that index seeks aren’t always as selective as they sound:

When you see “index seek” on an execution plan, that doesn’t mean SQL Server is jumping to exactly the row you’re looking for. It only means that SQL Server is seeking on the first column of the index.
This is especially misleading on indexes where the first column isn’t very selective.

Definitely worth the read.

Comments closed

Optimizing Index Spools

Published 2022-04-22 by Kevin Feasel

Francisco looks at index spools:

When we are analyzing execution plans, we may come across different types of Spool operators – Table Spools, Row Count Spools, Window Spools or Index Spools – that the Query Optimizer chooses for specific purposes. In this post we are going to briefly look into the Index Spool, how it can sometimes lead to suboptimal query performance, and what can be done to easily fix it.

My favorite description of this is Erik Darling’s: spools are SQL Server’s passive-aggressive way of telling you “I’m not saying you need an index but you need an index.”

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Indexing