Understanding NESTING_TRANSACTION_FULL

Joe Obbish digs into the NESTING_TRANSACTION_FULL latch, explains what it does, and shows how it might be a performance bottleneck:

We know that only one worker can get the exclusive latch for the transaction at a time. Let’s use a greatly simplified model for what each parallel worker does for this query. It reads a row, does processing for a row, and goes on to the next one. Once it has enough rows to write out a log record it tries to acquire the latch. If no one else has the latch in exclusive mode it can get the latch, update some structure in the parent transaction, release the latch, and continue reading rows. If another worker has the latch in exclusive mode then it adds itself to the FIFO wait queue for the latch subresource and suspends itself. When the resource is available the worker status changes from SUSPENDED to RUNNABLE. When it changes again from RUNNABLE to RUNNINGit acquires the latch, updates some structure in the parent transaction, releases the latch, and continues working until it either needs to suspend again or hits the end of its 4 ms quantum. When it hits the end of its 4 ms quantum it will immediately select itself to run again because there are no other runnable workers on the scheduler.

So what determines the level of contention? One important factor is the number of workers that are contending over the same subresource. For this latch and type of query (rows are pretty evenly distributed between worker threads), this is simply MAXDOP. There’s a tipping point for this query where adding more workers is simply counterproductive.

For years I’ve seen people in the community state that running queries at MAXDOPthat’s too high can be harmful. I’ve always been after simple demos that show why that can happen. The NESTING_TRANSACTION_FULL latch is an excellent example of why some queries run longer if MAXDOP is increased too far. There’s simply too much contention over a shared resource.

Read the whole thing.

Related Posts

HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING Wait Type

Chirag Shah explains what the HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING wait type really means: Recently a customer reported an interesting issue, while querying against recently added readable replica, SELECT statement is shown as suspended and session is shown as waiting on HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING […] Upon more investigation, it appeared to be waiting on with a wait type HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING The behavior is by […]

Read More

FCB_REPLICA_SYNC Spinlock Explanation

Paul Randal explains what the FCB_REPLICA_SYNC spinlock is and what it does: In a nutshell, this spinlock is used to synchronize access to the list of pages that are present in a database snapshot, as follows: If a page in a database with one or more database snapshots is being updated, check each snapshot’s list to […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930