Troubleshooting Availability Group Performance Dips

Simon Su walks us through a customer scenario where transactions per second would drop several orders of magnitude for a second, and then jump back up to normal:

The “Transaction Delay” value is an accumulation of the delay of all the current transaction delay in millisecond. You can see that the “Transaction Delay” counter has the same spikes as the sudden drop of the “Transactions Created/Sec”. Its spikes indicate that at those time points the AG transactions have time delay during commits.  This gives us a very good start point. We can focus on the transaction delay in our AG performance troubleshooting.

So who causes the transaction delay? Is it primary replica, secondary replica, or other factors like network traffic?

As a must go-through step for performance troubleshooting we captured performance monitor logs to check how the performance behaved on both replicas.  We want to find out whether there is any performance bottleneck existing in primary or secondary. For example, whether CPU usage is high when transaction delay spike happens, whether disk queue length is long, disk latency is large, etc.  We expect to find something that has the same spike trend as the “Transaction Created/sec” or “Transaction Delay”. Unfortunately, we do not anything interesting. CPU usage is as low 30%, Disk speed is quite fast. No disk queue length at all. We then checked AG related counters, like the log send queue and the recovery queue as the above two links mentioned but again we do not find anything helpful.

At the endpoint, there’s a reminder that you should keep up to date on patching systems.

Related Posts

When Synchronous AG Secondaries Are Out Of Sync

David Fowler explains that just because an Availability Group is set up as synchronous, doesn’t mean you can never experience data loss on failover: The primary replica is constantly monitoring the state of it’s secondaries. With the use of a continuous ping, the primary node always knows if the secondaries are up or down. It’s when SQL detects that one of it’s […]

Read More

When A Database In An AG Has Different Query Store Settings

Erin Stellato explains how we can have a discrepancy in Query Store settings between the primary and a secondary in an Availability Group: Last week there was a question on #sqlhelp on Twitter about the status of Query Store for a database in an Availability Group. I’ve written about Query Store and Availability Groups before so if […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30