Troubleshooting Availability Group Performance Dips

Simon Su walks us through a customer scenario where transactions per second would drop several orders of magnitude for a second, and then jump back up to normal:

The “Transaction Delay” value is an accumulation of the delay of all the current transaction delay in millisecond. You can see that the “Transaction Delay” counter has the same spikes as the sudden drop of the “Transactions Created/Sec”. Its spikes indicate that at those time points the AG transactions have time delay during commits.  This gives us a very good start point. We can focus on the transaction delay in our AG performance troubleshooting.

So who causes the transaction delay? Is it primary replica, secondary replica, or other factors like network traffic?

As a must go-through step for performance troubleshooting we captured performance monitor logs to check how the performance behaved on both replicas.  We want to find out whether there is any performance bottleneck existing in primary or secondary. For example, whether CPU usage is high when transaction delay spike happens, whether disk queue length is long, disk latency is large, etc.  We expect to find something that has the same spike trend as the “Transaction Created/sec” or “Transaction Delay”. Unfortunately, we do not anything interesting. CPU usage is as low 30%, Disk speed is quite fast. No disk queue length at all. We then checked AG related counters, like the log send queue and the recovery queue as the above two links mentioned but again we do not find anything helpful.

At the endpoint, there’s a reminder that you should keep up to date on patching systems.

Related Posts

Workarounds for Updating Stats on Secondaries

Niko Neugebauer wants statistics updates on tables running on readable Availability Group secondary nodes: Let’s list the basic known details for the possible solution(for the Enterprise Edition of the Sql Server that is):– We can make the secondary replica readable and read the same data on it. (Not that you should do that by default, […]

Read More

Dealing with HADR_SYNC_COMMIT Waits

Dmitri Korokevitch walks us through the HADR_SYNC_COMMIT wait type: The secondary nodes may be configured using asynchronous or synchronous commit. With asynchronous commit, transaction considered to be committed and all locks were released when COMMIT log record is hardened on the primary node. SQL Server sends COMMIT record to secondary node; however, it does not […]

Read More

Categories

April 2018
MTWTFSS
« Mar May »
 1
2345678
9101112131415
16171819202122
23242526272829
30