A couple of notes on the query. I cast the query_plan as xml so that I can use the XQuery to pull out the information. It is possible that the plan might be so large that you get an error because of the limit on nesting levels within XML. Also, I aggregate the information from the sys.query_store_runttime_stats. You may want to modify this to only look at limited ranges. I’ll leave that to you as an exercise.
Do read Grant’s warning in the conclusion.
Usually, the added features of the CREATE TABLE syntax in new releases of SQL Server are esoteric, and unless you are dealing with memory-optimized tables or other esoteric stuff, they aren’t of great interest. However, the Inline INDEX for both a table and column index has just crept in quietly with SQL Server 2014 (12.x). This was interesting because the SQL Server team back-fitted it to all tables rather than just in-memory OLTP tables for which it was, at the time, found necessary. The new syntax was introduced which allows you to create certain index types inline with the table definition. These could be at column level, concerning just that column, or at the table level, with indexes containing several columns.
Why interesting? This affects multi-statement table functions, user-defined table types, table-valued parameters as well as table variables. It was considered a game-change for table variables because, for a start, it allowed non-unique indexes or explicit clustered indexes to be declared on columns for the first time because you can create indexes on table variables as part of the table definition. Of more significance were the table-level indexes that allowed you to specify multi-column indexes. Previous releases had allowed multi-column primary or unique constraints, but not explicitly named indexes. You still cannot declare an index after the table is created, which is a shame as there are good reasons for being able to do so after a table is stocked with data. Any sort of large import of data into a table that is over-indexed or prematurely-indexed is doomed to crawl rather than to run. I’ll show this later on in this article.
Click through for an analysis of inline indexes themselves as well as how they fit on table variables—something I tend not to do much.
The time it takes to rebuild the index can be substantially longer for ONLINE. Many of us has other things to do with the database during night-time and/or weekends. Yes, these are the typical window in time where we try to find things such as index rebuilds. Say that you do it night-time and it currently take 4 hours. Wouldn’t it be nice if you could cut that time down to 1.5 hours? That would leave more time for imports, massaging of data, CHECKDB and other things you want to do. Sure, you can do it ONLINE, but it will slow down access during the rebuild. Also the more data you modify during the rebuild, the more space you need in tempdb.
Betteridge’s Law of Headlines applies too, so that’s two important principles in one post.
As far as the post goes, Tibor makes a fair point: there is a trade-off between availability and efficiency with index rebuilds. But having worked with clustered columnstore indexes in 2014, you’ll pry the online operations in subsequent versions out of my cold, dead hands.
When building indexes for your queries, the order of your index key columns matters. SQL Server can make the most effective use of an index if the data in that index is stored in the same order as what your query requires for a join, where predicate, grouping, or order by clause.
But if your query requires multiple key columns because of multiple predicates (eg. WHERE Color = ‘Red’ AND Size= ‘Medium’), what order should you define the columns in your index key column definition?
One of my favorite books for query tuning is a bit long in the tooth at this point but remains quite relevant, and a key point there is to look for ways to drop the largest percent of rows as soon as possible. This applies for good indexes as well: they’ll let you ignore as large a percentage of your irrelevant data as you can, as soon as possible.
Indexes are probably the number one tool we have to improve performance. That said, there are times when we want to put that index on hold. While indexes dramatically improve read performance they do cause a slight dip in write performance. This isn’t significant most of the time but when doing a large load it can frequently be faster to get rid of the existing indexes and then put them back when you are done.
I don’t think that I’ve ever regularly disabled indexes, even during bulk loading. It’s good to know that the option exists, however.
A while back I promised I’d write about what allows SQL Server to perform two seeks rather than a seek with a residual predicate.
More recently, a post touched a bit on predicate selectivity in index design, and how missing index requests don’t factor that in when requesting indexes.
This post should tie the two together a bit. Maybe. Hopefully. We’ll see where it goes, eh?
Also apropos: missing index hints return results in alphabetical order, not in selectivity order or what would be best for queries. In other words, just because the green text in SSMS says it’s the index you want doesn’t mean it’s the index you need.
If you think about it, if all we’re ever going to use is one part of the index, i.e. just the unfinalized rows, having an entry in there for every single row is quite wasteful, as although the vast majority of the index will never be used, it still has to be maintained.
So in SQL Server 2008, we got the ability to create a filtered index. Now these were actually added to support sparse columns. But on their own, they’re incredibly useful anyway.
I use these on occasion but less than I want to, and a big part of the reason why is in this post, particularly around parameters.
In SQL Server you need to pick which columns you like to index, In CosmosDB you need to pick which columns not to index. It’s kind of same thing at the end. You might ask “If everything is indexed and working fine, why do you want me to poke the well running system?” When we compare SQL Server indexes to CosmosDB Indexes, one thing works exactly same. That is the index file size. CosmosDB holds the indexes in a separate file like SQL Server and if we want to index everything, index file size is going to get large. Since we need to pay for the file space in CosmosDB, you might need to pay extra for indexes that you might never use. Also, your updates, inserts and deletes might cost you more Request Units since CosmosDB needs to maintain all the indexes in the background.
There’s just enough difference to make you pay the price if you assume Cosmos DB works just like SQL Server.
This is missing index request #1, and by default, this is the only missing index we’ll see by looking at the graphical execution plan. There’s actually a missing index request #2, which we can find in the XML (I know, it’s a little ugly to read. Bear with me).
I am of two minds on this. It probably should be easier to see multiple index candidates, but there’s already so much risk of people just copy-pastaing missing index recommendations that adding more seems like a bad idea.
There are workloads where frequent page splits are a problem. I thought I had a system like this many years ago, so I tested various fill factor settings for the culprit table’s clustered index. While insert performance improved by lowering the fill factor, read performance drastically got worse. Read performance was deemed much more critical than write performance on this system. I abandoned that change and instead recommended a table design change since it made sense for that particular table.
Click through for a demo.