If you think about it, if all we’re ever going to use is one part of the index, i.e. just the unfinalized rows, having an entry in there for every single row is quite wasteful, as although the vast majority of the index will never be used, it still has to be maintained.
So in SQL Server 2008, we got the ability to create a filtered index. Now these were actually added to support sparse columns. But on their own, they’re incredibly useful anyway.
I use these on occasion but less than I want to, and a big part of the reason why is in this post, particularly around parameters.
In SQL Server you need to pick which columns you like to index, In CosmosDB you need to pick which columns not to index. It’s kind of same thing at the end. You might ask “If everything is indexed and working fine, why do you want me to poke the well running system?” When we compare SQL Server indexes to CosmosDB Indexes, one thing works exactly same. That is the index file size. CosmosDB holds the indexes in a separate file like SQL Server and if we want to index everything, index file size is going to get large. Since we need to pay for the file space in CosmosDB, you might need to pay extra for indexes that you might never use. Also, your updates, inserts and deletes might cost you more Request Units since CosmosDB needs to maintain all the indexes in the background.
There’s just enough difference to make you pay the price if you assume Cosmos DB works just like SQL Server.
This is missing index request #1, and by default, this is the only missing index we’ll see by looking at the graphical execution plan. There’s actually a missing index request #2, which we can find in the XML (I know, it’s a little ugly to read. Bear with me).
I am of two minds on this. It probably should be easier to see multiple index candidates, but there’s already so much risk of people just copy-pastaing missing index recommendations that adding more seems like a bad idea.
There are workloads where frequent page splits are a problem. I thought I had a system like this many years ago, so I tested various fill factor settings for the culprit table’s clustered index. While insert performance improved by lowering the fill factor, read performance drastically got worse. Read performance was deemed much more critical than write performance on this system. I abandoned that change and instead recommended a table design change since it made sense for that particular table.
Click through for a demo.
SQL Server 2019 brings a very exciting new feature that, is long overdue. Resumable online index create is one of my favorite new things. This paired with the Resumable Index Rebuilds introduced with SQL Server 2017 really gives database administrators much more control over index processes.
Have you ever started to build a new index on very large table only to have users call and complain their process is hung, not completing, or system is slow? That’s when you realize you’re the cause because you tried to sneak in a new index. I have many times, because creating a new index can impact performance and can be a problematic process for users when you have no or little downtime windows available. When you kill the create process it rolls back requiring you to start from the beginning the next time. With resumable online index creation you now have the ability to pause and restart the build at the point it was paused. You can see where this can be very handy.
Click through for a demo and discussion on what options are available.
In a past post here I did an overview of different index types. I said in that post that I think filtered indexes could be more popular. In this post I will cover fixing some of the problems caused when you first introduce rowstore filtered indexes to a SQL Server database.
Some of you have probably been there already. You’ve put in your first filtered index on a database only to find an issue has happened. I’ve witnessed these issues at a few places. This will hopefully reduce the pain.
I’ve definitely experienced the third issue (which also pops up when using parameterized queries, so the optimizer doesn’t know that it can use the filtered index), but never the first two.
In the last year or so, with a large customer who makes fairly heavy use of this pattern, I’ve noticed another concern. Sometimes, and I can’t figure out what exactly triggers it, the execution plan generated, will do a seek against the nonclustered index and then do a key lookup against the columnstore as seen below. This is bad for two reasons–first the key lookup is super expensive, and generally columnstores are very large, secondly this key lookup is in row execution mode rather than batch and drops the rest of the execution plan into row mode, thus slowing the query down even further.
Joey also has a UserVoice item as well, so check it out.
About 10 years ago, I decided to drop an unused index on a table that had 2 billion rows. The database was around 7TB in size. We were having storage and performance issues. I thought I could help the system out if I were to drop the index.
4 days after I dropped the index, I got a call from our NOC that CPU utilization was pegged at 100% for an hour so they were reaching out to the on-call DBA to check it out. I logged in and saw a query using A LOT of CPU. Its WHERE clause matched the index I had dropped. The execution plan showed it was scanning the table.
It turned out that I only had 2 weeks of uptime, which didn’t include the 1st of the month. The query that was causing the high CPU utilization was a report that ran on the 1st of each month.
Tara has also provided us with a script to track these details over time, so check that out.
The authors conducted a series of microbenchmarks as follows:
scans with single predicates with varying selectivity to study the trade-off between the range scan of a B+ tree vs a columnstore scan
sort and group-by queries to study the benefit of the sort order supported by B+ trees (columnstores in SQL Server are not sorted).
update statements with varying numbers of updated rows to analyze the cost of updating the different index types
mixed workloads with different combinations of reads and updates
It’s interesting to read an academic paper covering the topic, particularly when you can confirm that it works well in practice too.
When beginning to learn SQL, at some point you learn that indexes can be created to help improve the performance of queries.
Creating your first few indexes can be intimidating though, particularly when trying to understand what order to put your key columns in.
Today we’ll look at how row store indexes work to understand whether index column order matters.
Despite the flagrant violation, you should check out Bert’s post, as it’s a good one.