The simplest way of removing empty data pages from a heap is via the exclusive locking of the table. But keep in mind that there is no such thing as a free lunch. Blocking the table from other processes will lead to a system that will not scale. As long as the DELETE process locks the table, no other process can access the table. The only exception is the “SNAPSHOT ISOLATION” but explaining the pros and cons of this technique is outside the scope of this article.
The next example (with the new created table) demonstrates the behavior.
The section on lock escalation was rather interesting.
I’m an outspoken advocate of always using a clustered index on each and every table you create as a matter of best practice. But even I will agree that there’s a case for using the odd heap now and then.
Read on for indicators that you might be better served with a heap.
So make sure you really need all that junk in your nonclustered index trunk. Er, key.
But even with the expanded size of key columns, sometimes I get asked a question: do columns that “secretly” get added to the key of a nonclustered index count against the maximum allowed nonclustered index key length?
You can read the short answer, but I recommend reading the full post.
A natural key is one constructed of data that already exists in the table. For example using latitude and longitude in a table of addresses. Or the social security number in a table of employees. (Before you say anything, yes, the social security number is a horrible primary key. Be patient.)
My personal preference is to use surrogate keys most of the time and put unique constraints (or unique indexes) on the natural key. There are some occasions in which I’d deviate, but ceteris paribus I’d pick this strategy..
The CHARACTER_MAXIMUM_LENGTH <> -1 OR IS NULL stuff is because I don’t want to try to index NVARCHAR(MAX) fields, but I do want to index NVARCHAR(50) fields.
The ORDER BY is because I wanted to get a wide variety of leading fields. If I just ordered by c1.COLUMN_NAME, I wouldn’t get all of the first fields covered in the first 999 indexes. (And I’m not even using a large table.)
Sometimes I think I’ve worked on systems which used this script to build indexes. But then I read the index names: “dta.” And it all makes sense.
As you can see, I definitely have a lot of free space, but my data are so spread across the file and especially up to it’s border, that there is no way to make file size smaller.
If we zoom at the very tail we can figure out the names of tables at the very end of the file, which prevent file from shrinking:
This looks quite a bit like the old Windows 95 defrag tool. I like it.
In this case, we have to assume that
Event IDs may be coming from anywhere and, as such, may not arrive in order. Even though we’re largely appending to the table, we may not be appending in a strict order. Using a clustered index to support the table isn’t the best option in this case – data will be inserted somewhat randomly. We’ll spend maintenance cycles defragmenting this data.
Another downside to this approach is that data is largely queried by
Owner ID. These aren’t unique, and one
Owner IDcould have many events or only a few events. To support our querying pattern we need to create a multi-column clustering key or create an index to support querying patterns.
This result is not intuitive to me, and I recommend reading the whole thing.
Forwarded records are rows in a heap that have been moved from the original page to a new page, leaving behind a forwarding record pointer on the original page to point at the new page. This occurs when you update a column that increases the size of the column and can no longer fit on the page. UPDATEs can cause forwarded records if the updated data does not fit on the page. Forwarding pointers are used to keep track of where the data is.
The comments are also worth reading. Except for the terrible puns.
The query runs faster, make no mistake – but check out the estimates:
- Estimated number of rows = 1
- Actual number of rows = 165,367
Those estimates are built by SQL Server’s cardinality estimator (CE), and there have been major changes to it over the last couple of versions. You can control which CE you’re using by changing the database’s compatibility level. This particular StackOverflow database is running in 2016 compat mode – so what happens if we switch it back to 2012 compat mode?
Based on this result, there might be further optimizations available. Read on for more of Brent’s thoughts.
I was looking through some terms in SQL Server documentation the other day, thinking about what it’s like to learn about SQL Server’s indexes when you’re new to the field. I jotted down a note: B-tree = Rowstore = Disk Based
And then I realized that’s not quite right.
Not all disk based indexes are traditional clustered and nonclustered indexes. Columnstore indexes are also disk based. Updatable Columnstore indexes use special rowstore B-trees behind the scenes. And Books Online says “rowstore” also refers to Memory-Optimized tables.
If you’re new to indexing, this picture will save you some learning time.