Press "Enter" to skip to content

Category: Indexing

Multidimensional Bloom Filters

The Instaclustr team talks bloom filters:

Bloom filters are space-efficient probabilistic data structures that can yield false positives but not false negatives. They were initially described by Burton Bloom in his 1970 paper  “Space/Time Trade-offs in Hash Coding with Allowable Errors“. They are used in many modern systems including the internals of the Apache® projects Cassandra®, Spark™, Hadoop®, Accumulo®, ORC™, and  Kudu™.

Multidimensional Bloom filters are data structures to search collections of Bloom filters for matches. The simplest implementation of a Multidimensional Bloom filter is a simple list that is iterated over when searching for matches. For small collections (n < 1000) this is the most efficient solution. However, when working with collections at scale other solutions can be more efficient. 

Read on to learn more, including some discussion about an implementation in Cassandra.

Comments closed

Finding Queries with Missing Index Requests in SQL Server 2019

Erik Darling shows off a nicety in SQL Server 2019 and later:

Note that this script does not assemble the missing index definition for you. That stuff is all readily available in the query plans that get returned here, and of course the missing index feature has many caveats and limitations to it.

You should, as often as possible, execute the query and collect the actual execution plan to see where the time is spent before adding anything in.

Read on for the script.

Comments closed

Understanding Missing Index Impact

Erik Darling delves into the depths of missing indexes:

Breaking each of those down, the only one that has a concrete meaning is Uses, but that of course doesn’t mean that a query took a long time or is even terribly inefficient.

That leaves us with Average Query Cost, which is the sum of each operator’s estimated cost in the query plan, and Impact.

But where does Impact come from?

Read on to learn where, as well as why you shouldn’t blindly trust that number.

Comments closed

Finding Indexing Metrics in Cosmos DB

Hasan Savran looks at the numbers:

You might need Composite Indexes to make your queries more efficient, Cosmos DB does not create any Composite Indexes for you. You need to figure out which properties should have composite indexes then you need to change the indexing policy file to create them. 

    Indexing Metrics comes to your help when you need help with indexing policy. It tells you which indexes the current query uses and it gives you hints about what other indexes you should create to make the query work faster/cheaper. Like many other features of Cosmos DB, you need to write code by using SDK to see Indexing Metrics. The following example shows how to enable Indexing Metrics for your queries.

Click through for a code sample which shows how to collect index metrics.

Comments closed

An Index for Change Tracking Cleanup

James Ferebee creates an index:

If the issue persists and we stack a lot of data that needs to be removed which can be too much for autocleanup to manage. At that point, we recommend running (as is indicated in the error message) sp_flush_CT_internal_table_on_demand which is discusseed in detailed in Amit’s blog post Change Tracking Cleanup–Part 1.

I will not discuss the internals of the process as Amit Banerjee already touched on this in the referenced blog. However if you are routinely encountering issues where autocleanup can’t keep up and/or manual cleanup is taking significant time, you can add the index and see if it helps and I have the process enumerated below. Keep in mind this is not guaranteed to fix all cleanup issues and it may still be necessary to run manual cleanup regularly. If you continue to have issues with cleanup feel free to create a ticket with us here at CSS to assist you and get specific data to your environment.

Read on for the index definition as well as some important notes about whether you might need it.

Comments closed

Optimizing Index Spools

Francisco looks at index spools:

When we are analyzing execution plans, we may come across different types of Spool operators – Table Spools, Row Count Spools, Window Spools or Index Spools – that the Query Optimizer chooses for specific purposes. In this post we are going to briefly look into the Index Spool, how it can sometimes lead to suboptimal query performance, and what can be done to easily fix it.

My favorite description of this is Erik Darling’s: spools are SQL Server’s passive-aggressive way of telling you “I’m not saying you need an index but you need an index.”

Comments closed

Consolidating Indexes

Erik Darling runs through an exercise:

The more columns you have in a table, the more potential column combinations there are for indexes. Much like columns, indexes tend to get added following the path of least resistance.

Very rarely does someone consider current indexes when deciding to add an index.

Erik’s process is a good one. The real pain comes when there are 40-50 indexes on a table (seriously…) and there are a lot of similar-but-not-quite-similar-enough indexes.

Comments closed

The Practical Costs of Index Fragmentation

Tibor Karaszi digs into index performance:

See numbers and diagrams at the end, or at the top. I measured a few cases: the difference between no external fragmentation and severe external fragmentation (over 99%). I have both a narrow index and a wide index, and I read one (1), 10,000 and 100,000 rows using index searches (“range scan”). There were obviously no difference reading 1 row so I exclude that from my discussion below. For the other cases the extra time with an extreme level of external fragmentation is (from lowest impact to highest) 7%, 10%, 13% and 32%. The highest number (32%) is when reading many rows from a narrow index, i.e. many rows per page. Again, this is with an extreme level of fragmentation.

What’s interesting is that for the most part, there’s a negligible difference between ~0% internal fragmentation and ~99% internal fragmentation. The follow-on question is, how much are defrag operations costing you in performance and when is the benefit worth the cost?

Comments closed

The Upsert Pattern and Unique Indexes in SQL Server

Michael J Swart recommends a unique index:

To avoid deadlocks when implementing the upsert pattern, make sure the index on the key column is unique. It’s not sufficient that all the values in that particular column happen to be unique. The index must be defined to be unique, otherwise concurrent queries can still produce deadlocks.

Read on for Michael’s thoughts as well as some smart commenters adding on.

Comments closed