Deadlock detection and analysis in SQL Server is important for the overall health of affected applications. This post shows how to setup an Extended Events session to capture deadlock details. It also provides a stored procedure to capture details from the Extended Events session. The stored procedure enables simplified permanent storage of those deadlock detection events. Next, we’ll set up a SQL Server Agent Job to run the stored procedure on an ongoing basis. Finally, we’ll see several examples of how to query the captured events. These queries support making the necessary changes to both the application and database design.
Click through for a description of what a deadlock is as well as scripts to help find and fix them.
Before getting into how to set the DEADLOCK_PRIORITY, let’s quickly go over what the options are. There are two ways to set the priority of a process. The first option is to use one of the keywords, LOW, NORMAL or HIGH. The second approach is to use a numeric value between -10 and 10. The lowest value will be chosen as the victim. For example, LOW will be the victim of the other process is HIGH and -1 will be the victim if the other process is greater than -1.
As I recall, index operations (like rebuilds) are automatically set to a low priority.
Let’s also say that bad query is taking part in a modification.
SET u2.Reputation *= 2
FROM Users AS u
JOIN dbo.Users AS u2
ON CHARINDEX(u.DisplayName, u2.DisplayName) > 0
WHERE u2.Reputation >= 100000;
AND u.Id <> u2.Id;
This query will run for so long that we’ll get sick of waiting for it. It’s really holding up writing this blog post.
Erik rewrites this query a couple of times. Click through to learn what he does and why he does it.
The first thing I found is that there were 16 attempts at promotion, and four successful promotions.
Why did this seem weird? I dunno.
Why would there be only 4 successful attempts with no competing locks from other queries?
Why wouldn’t all 16 get promotions?
Find out the answer to this and much, much more if you click the link.
You will find that the SELECT statement executes, ignoring the exclusive lock, because it is not a write lock, and the data on the page has not been changed.
The main reason people try to do this is to force access to a row in a single threaded manner. For example, building their own sequence number, either in a row they update, or by trying to do MAX() on all of the data in a table to make sure only one reader gets the same value.
This is generally a bad idea, since locking an entire table is a generally bad idea, but if you needed to block readers, you can couple the XLOCK with a PAGLOCK. So, change the first reader to:
FROM Demo.Test WITH (XLOCK,PAGLOCK);
As Louis points out in the summary, locking is complicated. Having a good understanding of the locking model will serve you well, though.
I have a client that used Itzik Ben-Gan’s solution of creating a filtered nonclustered columnstore index to achieve batch mode on a rowstore (in fact I proposed that the client consider it). They have an OLTP system, and often perform YTD calculations. When they tested, processing time was reduced by 30 to 50 percent, without touching a single line of application code. If that ain’t low hanging fruit, I don’t know what is —
However, during testing, I noticed some intermittent blocking that didn’t make sense to me. But I couldn’t nail it down, and they went live with the “filtered nonclustered columnstore index” solution.
Once they deployed – and there was a lot of concurrency – I could see what had eluded me during my proof of concept: blocking in tempdb.
Read on for the repro and check out Ned’s UserVoice bug report.
I can’t make heads or tails of that but I can tell you that seems like a really bad brawl for resources. It’s like a Jerry Springer show with a few extras thrown in. Since I knew that my graph wasn’t going to be helpful in this instance I went to the actual xml and tried to figure out how I could tune this to make it better in the future. I needed to know exactly where the issue was so the waitresource pointer is a good place to start.
You will see many blog articles on how to find SQL wait resources when the resource type is a key, a page, or an object (I suggest Kendra Little’s blog post) There is however a noticeable glut on articles explaining RID (a RID is a key on a table with no clustered index). I finally found how to tie a RID to an actual resource name but it was used for corruption so the details were a bit hazy at first.
Click through for this work of database art as well as a script which links RIDs back to specific object names.
So what causes Range Locks? Just ask Sunil. He knows everything (this assumes the serializable isolation level):
If the key value exists, then the range lock is only taken if the index is non-unique. In the non-unique index case, the ‘range’ lock is taken on the requested key and on the ‘next’ key.
If the ‘next’ key does not exist, then a range lock is taken on the ‘infinity’ value. If the index is unique then a regular S lock on the key.
If the key does not exist, then the ‘range’ lock is taken on the ‘next’ key both for unique and non-unique index.
If the ‘next’ key does not exist, then a range lock is taken on the ‘infinity’ value.
Range Predicate (key between the two values)
‘range lock on all the key values in the range when using ‘between’
‘range’ lock on the ‘next’ key that is outside the range. This is true both for unique and non-unique indexes. This is to ensure that no row can be inserted between the requested key and the one after that. If the ‘next’ key does not exist, then a range lock is taken on the ‘infinity’ value.
Erik has an interesting example and lets us see a potential concurrency problem with multi-table indexed views.
There’s a lot going on there, but much of it is noise. There is a whole bunch of contention on the table
SqlPerf.Session— session 342 is trying to perform an update, but it is stuck waiting on shared locks taken by two services. Now, let’s check the Optimize Layout box above, and look at the circular graph again. Simplified, right?
This checkbox is easily the most powerful option to discard noise and help you focus on the crux of the deadlock issue. In the original graph, you can see that many of the elements presented are simply innocent bystanders — waiters that are captured as part of the deadlock activity, but in no way contributing to it. We can detect this in a lot of cases and so, when you check the box, we hide them from view, allowing you to focus much more directly on the key players involved in the deadlock. There is no question that eliminating the noise can really speed up troubleshooting; with those extra nodes removed, I can clearly see that I have some kind of order-of-operations issue on the
SqlPerf.Sessiontable, between the transfer service and the processor service.
The truncate option is fast and efficient but did you know that it takes a certain lock where you could actually be blocked?
What am I talking about? When you issue a truncate it takes a Sch-M lock and it uses this when it is moving the allocation units to the deferred drop queue. So if it takes this lock and you look at the locking compatibility matrix below you will see what can cause a conflict (C).
Arun includes an image which shows what can block what, and also shows us an example.