Press "Enter" to skip to content

Category: Availability Groups

Availability Group Backup Preferences

Shaun Stuart points out that the Backup Preferences tab of the Availability Group Properties for an AG is a little tricky:

The default, and the way my AG was configured, was Prefer Secondary. As the image shows, this means backups will be made on the secondary, unless the secondary is unavailable, in which case, they will be made on the primary.

There are a couple of things to note when you use this setting:

  1. Full backups made on the secondary are Copy Only backups. This means they won’t reset the differential bitmap and your differentials will continue to increase in size until a full backup is made on the primary.

  2. Differential backups cannot be made on the secondary.

  3. Transaction log backups can be made on the secondary and they do clear the log, so your log file will not continue to grow.

Read on for more details.

Comments closed

Joining Availability Groups

Chris Lumnah troubleshoots an error in automatic seeding of an Availability Group:

In my lab, I decided to play around with the automatic seeding functionality that is part of Availability Groups. This was sparked by my last post about putting SSISDB into an AG. I wanted to see how it would work for a regular database. When I attempted to do so, I received the following error:

Cannot alter the availability group ‘Group1’, because it does not exist or you do not have permission. (Microsoft SQL Server, Error: 15151)

Read on for the answer; it turns out automatic seeding itself was not the culprit.

Comments closed

Availability Group Agent Alerts

Tracy Boggiano builds a set of SQL Server alerts related to Availability Group happenings and issues:

For Availability Groups we have a few extra error numbers we care about. Error number 1480 tells when a server changes roles, so we can know when a server flips from a secondary to a primary, or from a primary to a secondary. Error number 35264 tells when data movement has suspended on any database. This can occur for many reasons. One I have seen is when you have expanded your mount point on your primary and the data or log file runs out of space on the secondary the data or log file can not expand on the secondary because you forgot to expand the secondary. Error number 35265 tells you when the data movement has resumed on any database.  Error number 41404 let’s you know if your AG is offline which can be bad if you expected an automatic failover.  Error number 41405 let’s you know if an Availability Group can’t automatically failover for any reason.  In the later to cases you will want to look at your SQL Error logs and AlwaysOn Extended Events Health session.

Click through for the alert scripts.

Comments closed

Availability Group Tips

Derik Hammer has some tips to help you learn about Availability Groups:

3. Use MultiSubnetFailover=true

The Availability Group Listener is technically an optional component of an Availability Group. However, in my opinion it is necessary. By default, your listener will register all IP addresses as DNS A records and it will have multiple IP addresses when your cluster crosses subnets, most commonly when you have disaster recovery between data centers. Using the MultiSubnetFailover=true parameter in your client connection strings will attempt to connect to all IP addresses and completes the connection on the first thread to succeed. The listener ensures that only one IP address is online at a time, therefore you always connect to correct node.

This feature effectively bypasses the limitations of your DNS cache. Traditionally, you would cache the IP address for a DNS record. When you needed the client to connect to a different IP address using the same virtual network name, you would have to wait for the time to live setting to expire. This would delay your recovery time. With the MultiSubnetFailover setting, you can still cache your IP addresses but without the delay that they could induce.

There’s some good reading here.

Comments closed

Perfmon Counters For Monitoring AGs

Tracy Boggiano has a set of Perfmon counters she uses to monitor Availability Groups:

Monitoring Availability Groups can be tricky.  The DMVs don’t update the log_send_rate and redo_rate all the time especially if nothing is happening so if you try to use those for calculations when monitoring you could false results worse yet pages in the middle of the night.  In order to calculate the log_send_rate and redo_rate you need to capture the perfmon counters ‘Log Bytes Flushed/sec’, ‘Redone Bytes/sec’, and ‘Log Bytes Received/sec’ into temporary tables WAITFOR 1 second then capture them again.  Below is query that captures this along side what you see in the DMV for when Microsoft might fix the issue.

Click through for a script as well as a Powershell cmdlet wrapper for running against a set of hosts in your Central Management Server.

Comments closed

Troubleshooting AG Creation Failure

Anthony Nocentino digs into logs to troubleshoot a failure when trying to create an Availability Group:

Now we have some data to look through!

When we look at the contents of the cluster logs, we’re totally on the other side of the spectrum when it comes to information verbosity. The logs so far have been pretty terse and haven’t really told us about what’s causing the failure…well dig through this log and you’ll likely find your reason and a lot more information. Good stuff to look at to get an understanding of the internals of WSFCs. Now for the the reason my Availability Group creation failed was permissions. Check out the log entries.

It’s a good troubleshooting guide.

Comments closed

Recompiling Memory-Optimized Procedures On AGs

Ned Otter takes note of how natively compiled stored procedures differ from traditional stored procedures, especially on Availability Group secondary nodes:

As of SQL 2016, the database engine automatically updates statistics for memory-optimized tables (documentation here), but recompilation of native modules must still be performed manually. But hey, that’s way better than SQL 2014, when you couldn’t recompile at all; you had to drop/recreate the native module. And natively compiled stored procedures don’t reside in the plan cache, because they are executed directly by the database engine.

This post attempts to determine if the requirement to manually recompile native modules is any different for AG secondary replicas.

The results are interesting.

Comments closed

Disappearing Availability Groups

Cody Konior has a not-so-great magic trick:

I logged onto that node and the AG Dashboard looked okay at first glance. But the test was still failing when I re-ran it manually, so, I looked deeper.

I logged onto the second node and noticed the AG was completely gone. All the databases were in recovery but there was no sign of the AG at all. Nothing. Nada. Zip. (I don’t have any other words). It’s like it was never there.

At first I thought someone must have done something awful. I quickly poured a coffee while checking the default trace which usually records system-level configuration changes like dropping an entire replica but in this case nothing relevant showed up.

Read on for the answer, as well as action items to take if you’re actively using Availability Groups.

Comments closed

AG Secondary Stats Overwritten With Sample

Taiob Ali has run into an interesting issue:

Once I update my statistics with fullscan, with in 10~20 seconds some of the statistics on the same table are getting update on secondary with a sample pecent of rows. Meaning my best statistics are being overwritten with good (full vs sample) statistics. On primary node once I run “Update statistics Tablename with fullscan” . I see following about statistics status.

After 10~20 seconds of updating statistics in primary node if I check the status of the same on my secondary nodes, I see fullscan statistics is replaced by sample statistics. Look at the rows_sampled and last_updated column, you will see the sample row number and last_updated column time is within few seconds of update in primary. RowsModified column still showing zero records.

It’s happening on an Availability Group secondary.  Taiob has a workaround, so read on for that.

Comments closed

Clusterless Availability Groups

Allan Hirt talks about clusterless AGs:

A quick history lesson: through SQL Server 2016, we have three main variants of AGs:

  • “Regular” AGs (i.e. the ones deployed using an underlying Windows Server failover cluster [WSFC] requiring Active Directory [AD]; SQL Server 2012+)
  •  AGs that can be deployed without AD, but using a WSFC and certificates (SQL Server 2016+ with Windows Server 2016+)
  • Distributed AGs (SQL Server 2016+)

SQL Server v.Next (download the bits here) adds another variant which is, to a degree, a side effect of how things can be deployed in Linux: AGs with no underlying cluster. In the case of a Windows Server-based install, this means that there could be no WSFC, and for Linux, currently no Pacemaker.

Read on for more details, including limitations and expectations.

Comments closed