Understanding NESTING_TRANSACTION_FULL

Joe Obbish digs into the NESTING_TRANSACTION_FULL latch, explains what it does, and shows how it might be a performance bottleneck:

We know that only one worker can get the exclusive latch for the transaction at a time. Let’s use a greatly simplified model for what each parallel worker does for this query. It reads a row, does processing for a row, and goes on to the next one. Once it has enough rows to write out a log record it tries to acquire the latch. If no one else has the latch in exclusive mode it can get the latch, update some structure in the parent transaction, release the latch, and continue reading rows. If another worker has the latch in exclusive mode then it adds itself to the FIFO wait queue for the latch subresource and suspends itself. When the resource is available the worker status changes from SUSPENDED to RUNNABLE. When it changes again from RUNNABLE to RUNNINGit acquires the latch, updates some structure in the parent transaction, releases the latch, and continues working until it either needs to suspend again or hits the end of its 4 ms quantum. When it hits the end of its 4 ms quantum it will immediately select itself to run again because there are no other runnable workers on the scheduler.

So what determines the level of contention? One important factor is the number of workers that are contending over the same subresource. For this latch and type of query (rows are pretty evenly distributed between worker threads), this is simply MAXDOP. There’s a tipping point for this query where adding more workers is simply counterproductive.

For years I’ve seen people in the community state that running queries at MAXDOPthat’s too high can be harmful. I’ve always been after simple demos that show why that can happen. The NESTING_TRANSACTION_FULL latch is an excellent example of why some queries run longer if MAXDOP is increased too far. There’s simply too much contention over a shared resource.

Read the whole thing.

Related Posts

Taking Action With Wait Stats

Aaron Bertrand lays out a course of action (or inaction) when dealing with the most common wait types in SQL Server: I started going a little further than this, mapping out some of the more common wait types, and noting some of the properties they shared. Translated into questions a tuner might have about a […]

Read More

SOS_WORK_DISPATCHER

Joe Obbish digs into a new wait type in SQL Server 2019: Upon upgrading to SQL Server 2019 CTP2, you may see the new SOS_WORK_DISPATCHER wait type at the top of the list: The above screenshot is server level wait stats from my four core desktop after SQL Server was running for a few hours. SQL […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930