Dealing with HADR_SYNC_COMMIT Waits

Dmitri Korokevitch walks us through the HADR_SYNC_COMMIT wait type:

The secondary nodes may be configured using asynchronous or synchronous commit. With asynchronous commit, transaction considered to be committed and all locks were released when COMMIT log record is hardened on the primary node. SQL Server sends COMMIT record to secondary node; however, it does not wait for the confirmation that the record had been hardened in the log there.

This behavior changes when you use synchronous commit as shown in Figure 1. In this mode, SQL Server does not consider transaction to be committed until it receives the confirmation that COMMIT log record is hardened in the log on the secondary node. The transaction on primary will remain active with all locks held in place until this confirmation is received. The session on primary is suspended with HADR_SYNC_COMMIT wait type.

Click through for the full story.

Considerations Before Failing Over an AG

David Fowler points out a few things to look at before failing over an Availability Group:

What about your SQL Agent jobs? Have you got jobs that perform actions on your data? If you have, do those jobs exist on the new primary? If they don’t then I’m happy to bet that whatever function that they were playing probably isn’t happening anymore.

One thing that I always want to make sure before I failover is, do I have all the relevant jobs ready to roll on the secondary server?

But what about the jobs that you’ve got on the old primary? There’s a fair chance that, if they’re doing any sort of data manipulation, they’re going to be failing.

Click through for additional considerations.

Connecting with Read Intent

John McCormack shows two ways to connect to an Availablity Group listener with read-only intent:

SQLCMD
The -Kreadonly switch is your key to success here but remember to also specify the database using -d. When not set (and with an initial catalog of master for my login), I found I always got the primary instance back during my check. This simple omission cost me hours of troubleshooting work, because I was convinced my listener wasn’t working correctly. In fact, I just wasn’t testing it correctly.

There’s some good information in here for sqlcmd and for SQL Server Management Studio.

The Distribution Database and AGs

Andy Mallon looks at a wrinkle Availability Groups adds to replication:

This has always worked super, and has been a go-to query for me for years. But when looking at a SQL Server 2017 distributor with the distribution database in an Availability Group, that wasn’t working. All the publications had a publisher_id of 1, but in sys.serversserver_id 1 was some random linked server, and definitely not the publisher. But replication was working great. Maybe replication was set up on the other AG replica, and server_id 1 came from there.

Nope. On the other replica, it was the same story. Server_id 1 was a random linked server, and nothing to do with replication at all, let alone the publisher. But replication was working perfectly. A teammate fooling around with it in dev confirmed that if he updated the publisher_id to match the server_id we thought it should join to, replication stopped working. So, that publisher_id of 1 was correct. Or special. But also definitely different than what I’ve seen in prior versions of SQL Server.

Read on to see what Andy learned.

CLR Assembly Failure After Availability Group Failover

Paul Randal walks us through a permissions issue after an Availability Group failover:

Jonathan was working with a client recently who experienced a CLR assembly failure after an AG failover and needed to figure out why. They’d been testing their AG disaster recovery strategy and ran into an unexpected problem with their application which relies heavily on SQLCLR and an UNSAFE assembly that calls a web service from inside SQL Server.  When they failed over their AG to their DR server, the CLR assembly failed with the following error:

An error occurred in the Microsoft .NET Framework while trying to load assembly id 65546. The server may be running out of resources, or the assembly may not be trusted with PERMISSION_SET = EXTERNAL_ACCESS or UNSAFE. Run the query again, or check documentation to see how to solve the assembly trust issues. For more information about this error: System.IO.FileLoadException: Could not load file or assembly ‘sqlclr_assemblyname, Version=1.0.0.0, Culture=neutral, PublicKeyToken=fa39443c11b12591’ or one of its dependencies. Exception from HRESULT: 0x80FC80F1

Read on to see the root cause and what you can do to correct it.

Comparing Oracle RAC to SQL Server Availability Groups

Kellyn Pot’vin-Gorman explains the difference between Oracle RAC and SQL Server Availability Groups:

There is a constant rumble among Oracle DBAs- either all-in for Oracle Real Application Cluster, (RAC) or a desire to use it for the tool it was technically intended for. Oracle RAC can be very enticing- complex and feature rich, its the standard for engineered systems, such as Oracle Exadata and even the Oracle Data Appliance, (ODA). Newer implementation features, such as Oracle RAC One-Node offered even greater flexibility in the design of Oracle environments, but we need to also discuss what it isn’t- Oracle RAC is not a Disaster Recovery solution.

Click through for a good high-level contrast, as these are quite different products.

Reading From a Secondary with Azure Data Studio

Mohammad Darab shows how you can access a readable secondary using Azure Data Studio:

Recently, I had to use Azure Data Studio to access a application intent read only secondary replica. I had to use Azure Data Studio because I was using a Mac. I usually use SSMS on my Windows machines. If you want to connect with the “applicationintent=readonly” property via SQL Server Management Studio, you do so by typing it out in the “Additional Connection Parameters” as shown in the screenshot below:

Since I am fairly new to Azure Data Studio I was fumbling my way around to find the equivalent setting. And I finally found it…

Read on to see how you can set this in Azure Data Studio.

Improving The Azure Automated AG Experience

Allan Hirt would like to see a few improvements to the experience when creating availability groups on Azure VMs:

What does this mean? To have a supported WSFC-based configuration (doesn’t matter what you are running on it – could be something non-SQL Server), you need to pass validation. xFailOverCluster does not allow this to be run. You can create the WSFC, you just can’t validate it. The point from a support view is that the WSFC has to be vetted before you create it. Could you run it after? Sure, but you still have no proof you had a valid configuration to start with which is what matters. This is a crucial step for all AGs and FCIs, especially since AGs do not check this whereas the installation process for FCIs does.

If you look at MSFT_xCluster, you’ll see what I am saying is true. It builds the WSFC without a whiff of Test-Cluster. To be fair, this can be done in non-Azure environments, too, but Microsoft givs you warnings not to do that for good reason. I understand why Microsoft did it this way. There is currently no tool, parser, or cmdlet to examine the output of Test-Cluster results. This goes back to why building WSFCs is *very* hard to automate.

I’m not sure how easy some of these fixes would be, but they’d definitely be nice.

When Synchronous AG Secondaries Are Out Of Sync

David Fowler explains that just because an Availability Group is set up as synchronous, doesn’t mean you can never experience data loss on failover:

The primary replica is constantly monitoring the state of it’s secondaries. With the use of a continuous ping, the primary node always knows if the secondaries are up or down.

It’s when SQL detects that one of it’s synchronous replicas goes offline is when interesting things can happen.

So here’s the discussion that came up, if a synchronous replica goes offline for whatever reason, SQL won’t be able to commit any transactions and that means we can be confident that the secondary is up to date, right?

Read on to learn the answer. Which is “no.” But David explains why, so you should read that instead of just having me say it.

When A Database In An AG Has Different Query Store Settings

Erin Stellato explains how we can have a discrepancy in Query Store settings between the primary and a secondary in an Availability Group:

Last week there was a question on #sqlhelp on Twitter about the status of Query Store for a database in an Availability Group. I’ve written about Query Store and Availability Groups before so if you’re not familiar with QS behavior in an AG, check out that post first. But this question was, I think, specific to the values that shows on a read-only replica and how there were different query store settings between a primary and secondary. Let’s set it up and take a look.

Click through to learn why this may be and why you shouldn’t panic.

Categories

June 2019
MTWTFSS
« May  
 12
3456789
10111213141516
17181920212223
24252627282930