Last week, I received an alert that the percentage of transaction log in use on one of our production databases was increasing more than it should have been. I’ll refer to this database as DB1 from here on in. It had reached 18%, which is above the normal 5-10% we like to see it at. DB1 is in an Always On Availability Group so if the log usage jumps we jump.
The first thing I checked was that the log backups were running. We use Ola Hallengren’s maintenance solution to manage our backups. The backup jobs were running without error.
The next thing to check was the log_reuse_wait_desc column in sys.databases.
The highlighted row is the row for DB1. A log_reuse_wait of 2 means there were no changes to backup since the last log backup. If that were the case then why was my transaction log slowly filling like an unattended bath?
Read on to learn more. James also has a link to a (closed/won’t-fix) Connect item.
What is a distributed availability group? Distributed availability groups allows you to create two different AGs on different Windows Server failover clusters (WSFCs) but join them together – if you will, an AG of AGs. This is great for disaster recovery scenarios where you do not want to worry (more than you should) about things like voters and quorum in a single WSFC which could make the configuration more complex. This also allows for different versions of Windows Server (one WSFC is Windows Server 2012 R2 and another Windows Server 2016). Heck, it could even facilitate migrations to new hardware/the public cloud/virtualization assuming the same major version of SQL Server. It’s a cool feature. You can only manually fail over the AG between the WSFCs, but that’s OK, and you do need a listener for each AG. If you do not plan on using a listener, you cannot create a distributed AG.
Side note: when I read DAG, I think directed acyclical graph. Maybe I’m just weird that way…
Main note: the idea of “seeding” an Availability Group sounds wonderful.
You work with Database Mirroring or AlwaysOn AG, and you want to make sure your end users work only on the secondary server. How should you do that?
This solution feels a little hacky to me. There’s enough value in it that I could see companies doing this, but it’d be nice if there were an easier way.
This architecture can be used when your organization does not value their secondary data center the same as the primary. It is a best practice to have matched or similar hardware between your primary and disaster recovery sites but that is not always possible. When costs need to be reduced it is better to have one failover server that you know can handle the work load rather than two servers which are under powered. Under powered hardware can easily become an effective outage if they cause timeouts as soon as a production work load is placed on them.
For many small to medium-sized enterprises, this might be the easiest route to sell to management—it’s hard to get management to have “redundant” servers which normally don’t get used.
The Availability Group will handle the data synchronization between the data centers. Asynchronous mode is recommend due to potential network latency hindering primary site performance. With the combination of these two features, you meet HA with the FCI’s automatic failover locally and DR with manual failover of the Availability Group between sites.
Derik’s conclusion is important: this isn’t the type of thing a brand new accidental DBA should try to build; there are a lot of moving parts here. But if you want a robust solution and have the support people available to handle the relative complexity, this is a good option.
NOTE: SQL Server 2016 introduces load-balancing across the secondary replicas in your AlwaysOn Availability Group. To utilize this feature, the above routing list will need to be modified so that SQL Server directs incoming read-only connections across the servers you choose to be load-balanced. You will have to use nested parentheses around the server instances you want to be part of the load-balanced group.
Read-only routing requires some additional setup and possibly load changes, but it can give you a nice performance gain by off-loading some of your read requests.