I have seen this debated in forums spread over the internet for decades, and the advice that we gave ten years ago isn’t as valid today as it was then. Ten years ago, memory was considerably less, and disks were spinning rust. The advent of SSD’s and the ability to get servers with more memory than data, even on large systems have changed how we should think about designing and maintaining databases.
I generally subscribe to the NUSE philosophy: Narrow, Unique, Static, Ever-Increasing. That generally leads me to selecting identity integers or longs. For junction tables (whose entire purpose is to join two tables together and which never get referenced outside of that), I use the primary key as the clustered index.
In extreme insert scenarios, I can see wanting to maximize fragmentation in order to insert into more pages in the B-tree and avoid hot spot pages.
In my last blog post, I wanted to focus on the sequential vs random I/O aspect and how that part should be more or less irrelevant with modern hardware. So I did a test that did a full scan (following the linked list of an index) and see if I could notice any performance difference on my SSD. I couldn’t.
That isn’t the end of the story, it turns out. Another aspect is how the data is brought into memory. You might know that SQL server can do “read ahead”, meaning it does larger reads per I/O instead if single-page I/O. I.e., fewer but larger I/O operations. This sounds fine, but what happens when we have fragmentation?
Read on for a situation in which fragmentation does matter.
What is a filtered index?
Simply it’s an index with a where clause. It is an optimized non clustered index that can be narrowed down in scope to better fit a subset of data. Example being date ranges, years, non NULLs or specific product types.
I wish filtered indexes were better than they are because they can solve some interesting problems but get stuck on parameterized queries.
Last week I was speaking at SQLDay 2019 conference in Wroclaw, Poland. My session was about things you should focus on during work with Legacy Databases. One of the topics I discussed was concerning the database usage statistics collection and aggregation (mainly indexes and stored procedures).
After the session, one of the attendees came to me and ask me if I know that actions causing check of Foreign Keys and that use indexes underneath, don’t update index usage stats DMV. I was very surprised because, in my opinion, such behavior would be a huge SQL Server defect. So I decided to check it out…
Read on for Marek’s explanation and demo.
The summary top row suggests that all inserts to an empty clustered index will be minimally logged as long as
ORDERhints are specified. The
TABLOCKhint is required to enable the
RowSetBulkfacility as used for heap table bulk loads. An
ORDERhint is required to ensure rows arrive at the Clustered Index Insert plan operator in target index key order. Without this guarantee, SQL Server might add index rows that are not sorted correctly, which would not be good.
Unlike other bulk loading methods, it is not possible to specify the required
ORDERhint on an
INSERT...SELECTstatement. This hint is not the same as using an
ORDER BYclause on the
ORDER BYclause on an
INSERTonly guarantees the way any identity values are assigned, not row insert order.
Read on to see what you can do.
We compress to gain space, correct? So why sacrifice it to our indexes?
Here is a script (complete with demo databases and tables) that will find the culprits and fix them for you. Put it (minus the demo databases and tables) in a job, and quit worrying about this. Your tables (and indexes) will be sparkling clean. You’re welcome.
Click through for the script. It’s a good reminder that compressing a clustered index does not automatically compress non-clustered indexes—nor should it, as these won’t necessarily compress at the same ratio because data type makeup can differ.
Locking hints can be really handy in these situations, especially the READPAST hint. The documentation for it says that it allows you to skip over row level locks (that means you can’t skip over page or object level locks).
What it leaves out is that your READPAST query may also need to try to take row level shared locks.
Read on for an example as well as an alternative which ends up being better in this case.
You know the story. Every week or so, we defragment the indexes. Many of us uses Ola Hallengren’s great script for this, some uses Maintenance Plans, and there are of course other alternatives as well. But are we just wasting time and effort? Quite probably we are. I’m going to start with some basics, and then do some reasoning, and finally give you some numbers of a very simple test that I ran. The T-SQL code is available. If you give it a try, please let us know your finding for your environment by adding a comment. I will do some generalizations and simplifications, to avid this post being 10 times longer.
Jeff Moden has a couple of great talks on the topic which really pushed me in this direction. Grab his slides from the SQL Saturday site for a much deeper look at this topic.
IGNORE_DUP_KEYindex option can be specified for both clustered and nonclustered unique indexes. Using it on a clustered index can result in much poorer performance than for a nonclustered unique index.
The size of the performance difference depends on how many uniqueness violations are encountered during the
INSERToperation. The more violations, the worse the clustered unique index performs by comparison. If there are no violations at all, the clustered index insert may even perform better.
IGNORE_DUP_KEY primarily in cases like queue tables where I might be queuing up changes to migrate to a warehouse and where the chance of collision is low but non-zero. It looks like pushing much beyond that pattern can be devastating for performance.
The creation of this view has chewed up a bunch of storage. It has jumped right up to the number two spot on the biggest objects list within this database. You can see that differences by comparing the highlighted rows to the previous image. The vPerson view is highlighted in red in this second image to help point it out quickly.
Surely this must be a contrived example and people don’t really do this in the real world, right? The answer to that is simply: NO! It DOES happen. I see situations like this all too often. Far too often, large text fields are added to an indexed view to make retrieval faster. I have mimicked that by adding in two XML columns from the Person.Person table. This is definitely overkill because a simple join back to the table based on the BusinessEntityID would get me those two columns. All I have effectively done is duplicated data being stored and I have achieved that at the low low cost of increased storage of 25% for this small database. If you are curious, the column count between the Person.Person table and this new view is 13 columns each.
Jason takes us through a couple more gotchas and provides some important advice you should follow if you think indexed views might be a fit for you.