Joe Obbish digs into the NESTING_TRANSACTION_FULL
latch, explains what it does, and shows how it might be a performance bottleneck:
We know that only one worker can get the exclusive latch for the transaction at a time. Let’s use a greatly simplified model for what each parallel worker does for this query. It reads a row, does processing for a row, and goes on to the next one. Once it has enough rows to write out a log record it tries to acquire the latch. If no one else has the latch in exclusive mode it can get the latch, update some structure in the parent transaction, release the latch, and continue reading rows. If another worker has the latch in exclusive mode then it adds itself to the FIFO wait queue for the latch subresource and suspends itself. When the resource is available the worker status changes from SUSPENDED
to RUNNABLE
. When it changes again from RUNNABLE
to RUNNING
it acquires the latch, updates some structure in the parent transaction, releases the latch, and continues working until it either needs to suspend again or hits the end of its 4 ms quantum. When it hits the end of its 4 ms quantum it will immediately select itself to run again because there are no other runnable workers on the scheduler.
So what determines the level of contention? One important factor is the number of workers that are contending over the same subresource. For this latch and type of query (rows are pretty evenly distributed between worker threads), this is simply MAXDOP
. There’s a tipping point for this query where adding more workers is simply counterproductive.
For years I’ve seen people in the community state that running queries at MAXDOP
that’s too high can be harmful. I’ve always been after simple demos that show why that can happen. The NESTING_TRANSACTION_FULL
latch is an excellent example of why some queries run longer if MAXDOP
is increased too far. There’s simply too much contention over a shared resource.
Read the whole thing.