Understanding NESTING_TRANSACTION_FULL

Joe Obbish digs into the NESTING_TRANSACTION_FULL latch, explains what it does, and shows how it might be a performance bottleneck:

We know that only one worker can get the exclusive latch for the transaction at a time. Let’s use a greatly simplified model for what each parallel worker does for this query. It reads a row, does processing for a row, and goes on to the next one. Once it has enough rows to write out a log record it tries to acquire the latch. If no one else has the latch in exclusive mode it can get the latch, update some structure in the parent transaction, release the latch, and continue reading rows. If another worker has the latch in exclusive mode then it adds itself to the FIFO wait queue for the latch subresource and suspends itself. When the resource is available the worker status changes from SUSPENDED to RUNNABLE. When it changes again from RUNNABLE to RUNNINGit acquires the latch, updates some structure in the parent transaction, releases the latch, and continues working until it either needs to suspend again or hits the end of its 4 ms quantum. When it hits the end of its 4 ms quantum it will immediately select itself to run again because there are no other runnable workers on the scheduler.

So what determines the level of contention? One important factor is the number of workers that are contending over the same subresource. For this latch and type of query (rows are pretty evenly distributed between worker threads), this is simply MAXDOP. There’s a tipping point for this query where adding more workers is simply counterproductive.

For years I’ve seen people in the community state that running queries at MAXDOPthat’s too high can be harmful. I’ve always been after simple demos that show why that can happen. The NESTING_TRANSACTION_FULL latch is an excellent example of why some queries run longer if MAXDOP is increased too far. There’s simply too much contention over a shared resource.

Read the whole thing.

Related Posts

Waiting on Stats Refreshes

Erik Darling looks at a new wait type: I mean ever really wondered just how long a query of yours waited on stats to automatically update before running? Have you ever been absolutely, positively sure that those gosh dang darn stats updates were putting your query on the trolley to Slowsville? Your wishes will be […]

Read More

The Folly of Aggregation, Wait Stats Edition

Erik Darling explains why simply looking at wait stats isn’t enough: I’m going to be honest with you, dear readers. I’ve fallen out of like with wait stats. Unless there’s a massive resource or blocking problem, they’re a bit bland. Tuning a single query, I don’t look at them at all. Sure, they can be […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930