Index fragmentation removal and prevention has long been a part of normal database maintenance operations, not only in SQL Server, but across many platforms. Index fragmentation affects performance for a lot of reasons, and most people talk about the effects of random small blocks of I/O that can happen physically to disk based storage as something to be avoided. The general concern around index fragmentation is that it affects the performance of scans through limiting the size of read-ahead I/Os. It’s based on this limited understanding of the problems that index fragmentation cause that some people have begun circulating the idea that index fragmentation doesn’t matter with Solid State Storage devices (SSDs) and that you can just ignore index fragmentation going forward.
However, that is not the case for a number of reasons. This article will explain and demonstrate one of those reasons: that index fragmentation can adversely impact execution plan choice for queries. This occurs because index fragmentation generally leads to an index having more pages (these extra pages come from page split operations, as described in this post on this site), and so the use of that index is deemed to have a higher cost by SQL Server’s query optimizer.
Let’s look at an example.
Check out the example, but definitely read the comments as there are some good conversations in there.
Just like in the root page and the intermediate pages, the FirstName and RowID columns are present.
Also in the leaf: CharCol, our included column appears! It was not in any of the other levels we inspected, because included columns only exist in the leaf of a nonclustered index.
Kendra does a great job of explaining the topic.
This group of DMVs records every scan and large key lookups. When the optimizer declares that there isn’t an index to support a query request it generally performs a scan. When this happens a row is created in the missing index DMV showing the table and columns that were scanned. If that exact same index is requested a second time, by the same query or another similar query, then the counters are increased by 1. That value will continue to grow if the workload continues to call for the index that doesn’t exist. It also records the cost of the query with the table scan and a suspected percentage improvement if only that missing index had existed. The below query calculated those values together to determine a value number.
Click through for sample scripts for this and the index usage stats DMV. The tricky part is to synthesize the results of these DMVs into the minimum number of viable indexes. Unlike the optimizer—which is only concerned with making the particular query that ran faster—you have knowledge of all of the queries in play and can find commonalities.
In the previous blog post In-Memory OLTP Indexes – Part 1: Recommendations, we gave you an update on the latest features of In-Memory OLTP technology. We also summarized the key characteristics of memory-optimized indexes and shared some guidelines and recommendations on how to best choose and configure an index for your memory-optimized table. At this point, if you haven’t read through the previous blog post, we strongly recommend you do so. In this blog post we continue onwards; take the learnings from the previous blog (Part 1) and using some sample examples, apply them in practice. The learnings from this blog post (Part 2) will be particularly useful if you are experiencing query performance issues with memory-optimized tables; either after migration from disk-based tables or in general, with your production workload leveraging memory-optimized tables.
To summarize this blog post covers the following:
Common mistakes and pitfalls to avoid when working with memory-optimized indexes.
Best practices to follow when configuring your memory-optimized indexes for optimal performance.
Troubleshooting and Mitigating your query performance issues with memory-optimized indexes.
Monitoring your query performance with memory-optimized indexes.
There’s a lot of detail in this post, and tuning these types of indexes isn’t quite the same as normal, disk-based indexes.
Larger numbers of indices create exponentially more query plan possibilities. When too many choices exist, the Optimizer will give up partway through and just pick the best plan thus far. As more indices are added the problem worsens and compilation times, i.e., processor times, increase to a point. This can be illustrated best by reviewing an actual customer example. In this case, one table had 144 indices attached to it and several others had between 20 and 130 indices. The queries were quite complex with as many as fifteen joins, many of which were outer joins. Query and index tuning were impossible because query performance was often counterintuitive and sometimes nonsensical. Adding an index that addressed a specific query need often made the query run worse one time and better the next. Note: Cached query plan issues, e.g., parameter sniffing or plan reuse were not problems in this case. The only solution was to tear down ENTIRE indexing structure and rebuild it with SQL Server’s guidance and nine days’ worth of production queries. Table 5 summarizes the results of the index restructuring project. The performance of 98 percent of the queries was comparable to or better than it was when the large numbers of indices were present.
Don’t be that company.
To further optimize your queries that do point lookups on your memory-optimized table using an equality operator (=), consider using a Hash index. Hash indexes also optimize insert operations and aid database recovery time. As mentioned earlier, a Hash index uses a hash table internally to keep track of the index keys. The hashing function uses all the key(s) specified at the index creation time to create a mapping between the index key column(s) and the corresponding hash bucket. Hence, it is important that you specify all the index key columns that are part of the hash index, during the lookup in your query.
As with any hashing technique, collisions will occur. A hash collision occurs when two or more index keys map to the same hash bucket. In general, having a limited number of collisions is expected and is fine. It is only when the number of collisions become excessive, it may noticeably begin to impact your query performance. Your aim should be to keep the number of collisions as low as possible. You may experience a higher number of collisions either because: –
There are many duplicate index key column values in the table and/or
The number of hash buckets for your Hash index are under provisioned.
There’s some helpful information here if you are new to In-Memory OLTP.
Table 1 shows examples of queries that potentially need tuning based upon the number of executions, total reads, total duration, total CPU time, and average reads per execution. This kind of report immediately focuses attention on the queries that might benefit the most from either index or query tuning. The five queries highlighted in Table 1 underscore these criteria. The ones highlighted in yellow were the worst offenders because their executions collectively performed the most reads with the worst one totaling 3.5 BILLION reads. The ones highlighted in light green and orange accounted for the most CPU time as well as the longest total duration. The one highlighted in slate ran the most times, and the ones highlighted in gray performed the most reads per execution. This information is vital when determining where query and index tuning should be focused.
Jeff walks through some of his data collection and analysis process in this post, making it worth a read.
Index usage and tuning metrics became available on SQL Server 2005 with Dynamic Management Views and Functions, which will be discussed later. However, the meanings and significance of index DMV/DMF metrics are still not well understood by many despite only minor additions over the years. Specifically, the following list contains a synopsis of the topics that the author has observed to be the most salient index-related issues:
- Queries that need an index to function efficiently
- Which indices, if any, are used by a query (and how they are used, e.g., randomly or sequentially)
- Tables (and their indices) that merit evaluation and potential adjustment
- Indices that duplicate functionality of others
- A new index is truly needed and what improvement can be anticipated
- An index can be deleted without harming performance
- An index should be adjusted or added despite the query not producing any missing index warnings
Understanding why having too many indices results in
- Inserts and updates taking too long and/or creating blocking
- Suboptimal query plans being generated because there are too many index choices
Knowing Database Engine Tuning Advisor (DTA) pros & cons
Jeff starts with the basics of indexes, followed by some general strategy. This promises to be the first of several posts on the topic.
Now, let’s attempt to create a NONCLUSTERED COLUMNSTORE INDEX:ALTER TABLE dbo.InMemADD ADD INDEX NCCI_InMem NONCLUSTERED COLUMNSTORE (col1);
Msg 10794, Level 16, State 76, Line 76
The feature ‘NONCLUSTERED COLUMNSTORE’ is not supported with memory optimized tables.
It fails because we can only create a CLUSTERED columnstore index (CCI). For 25 years, Microsoft SQL Server differentiated between indexes that physically ordered data on storage (CLUSTERED) and those that did not (NONCLUSTERED). Unfortunately, they chose to ignore that pattern when creating the syntax for memory-optimized tables; using the word CLUSTERED is required when creating a columnstore index on memory-optimized tables.
It’s a great post with plenty of trial and error.
As it transpires, an index rebuild will reset the counters for this index within the sys.dm_db_index_usage DMV and this is potentially very dangerous unless you are aware of this. Normally, we determine whether or not an index is in use by looking at this information, but if you perform regular maintenance on this table, you could be resetting the data which you rely on for an accurate decision.
Read the whole thing.