Press "Enter" to skip to content

Category: Availability Groups

Working with Read-Only Endpoints in Azure SQL Database

Arun Sirpal takes us through one method for improving performance in Azure SQL Database:

One of the main benefits of configuring active geo-replication for Azure SQL Database is leveraging the read-only endpoint, a good technique to split away read only activity from OLTP based workloads. This means that there is no reason why you cannot point users to these databases via tools such as Power BI as highlighted below.

But there are some things to keep in mind, as Arun points out.

Comments closed

Fixing Availability Group Issues with Alerts

Wayne Sheffield automates a few problems away:

The first of the Availability Group issues to discuss is that, for whatever reason, data is no longer moving between the primary replica and a secondary replica. This puts the Data Movement in a Suspended state.

If the data movement remains suspended for too long, you might have to take some undesired actions to get things back in sync. Things like removing the database from the AG, restoring log files, then reattaching it to the AG. When the data movement becomes suspended, we want to get it flowing again as soon as possible. Let’s have SQL Server try to get the data flowing again.

Read on for more, including a second issues that Wayne helps solve.

Comments closed

Lessons Learned from Migrating to SQL Server 2017 with Availability Groups

Lee Markum has an after-action report:

In late 2019, a project that had been in progress for well over a year finally came to a conclusion.  I had collaborated with a number of people to migrate a stand alone SQL Server 2008 and two stand alone SQL Server 2008 R2 instances.  Each stand alone SQL Server was migrated to a three node Availability Group. Here are a few things learned along the way.

Click through for some good advice.

Comments closed

A Note on Distributed Network Names

Allan Hirt provides an explanation around Distributed Network Names when building Windows Server Failover Clusters on Windows Server 2019:

The new Windows Server 2019 DNN functionality does have a side effect that does affect Azure-based configurations. When creating a WSFC, Windows Server 2019 detects that the VM is running in Azure and will use a DNN for the WSFC name. This is the default behavior.

I clipped this paragraph specifically because Allan uses both “affect” and “effect” correctly, and I wanted to call that out. Do read the rest of it as well.

Comments closed

Availability Group Bug when Removing and Adding the Same Database

Josh Darnell takes us through a tricky problem:

I came across a bug in SQL Server 2016 where the Availability Group (AG) health check can get stuck in an infinite loop after removing and re-adding a database from an AG.

Unfortunately, I don’t know exactly what version this bug was introduced. I first noticed the problem on SQL Server 2016 SP2 CU7 GDR (13.0.5366.0). It may have existed before then, but I never encountered it.

Read on for a workaround. And hopefully there will be a proper fix soon. Also, it’d be interesting to see if it can be reproduced in 2017 or 2019.

Comments closed

Frequently Asked Availability Group Questions

Sean Gallardy answers the voices swarming in his head:

Q3: Do my secondary replicas really need to be the same level of hardware as my primary replicas?
Sean: They should, this assumes you are eventually going to fail over to them or they might need to run your workload. It does *not* assume they will be used for readable workloads. Remember that the secondary replica is already accepting and writing log data, redoing that data, and then if you’re running a bunch of read workloads on top of that it might actually be *busier* than your primary. Yes, I’ve witnessed secondary replicas getting hammered harder than their associated primary.

Read on for informative and entertaining answers, and avoid those rusty nails.

Comments closed

SQL Agent Job Control Based on AG/Mirroring Status

Eitan Blumin has an interesting solution for us:

Lo and behold, my new and improved scripts, which implement the following logic:

Create a single Master Control Job with an hourly schedule, and also set it to run when SQL Agent is started up. The Master Control Job will do the following:

Automatically detect which jobs have steps that run on databases that are involved in an HADR solution.

For each such job, automatically detect whether there’s at least one step run on a database which currently has the Primary/Principle role.

If so, make sure it’s enabled. Otherwise, make sure it’s disabled.

Create an alert for a role/state change event, if such doesn’t exist yet, and set it to run the Master Control Job when triggered.

Click through for more details and links to scripts for Availability Group and database mirroring scenarios.

Comments closed

Preventing Unexpected Failover with AG Patching

Josh Darnell dealt with an issue with Availability Groups:

I had a 2-node availability group (AG) + fileshare witness system experience an unexpected failover recently.

The synchronous secondary was being patched, and when it came back up from a reboot, the current primary unexpectedly failed over. We weren’t done with all the patching on the secondary, so this caused a short outage, and we had to fail back to the original primary to finish the patching (which is of course another short interruption in availability).

The root cause was interesting enough that I decided to share the story here, and provide some general advice and debugging tips along the way.

Click through to understand why this happened and how you might be able to avoid the pain Josh experienced.

Comments closed

Thread Pool Exhaustion and Availability Groups

Sean Gallardy lays down the gauntlet:

You’re probably wondering why you couldn’t spawn a new thread, why this error happened, why you shouldn’t just up the max worker threads, and probably also thinking this must be some kind of “bug” in SQL Server. So here’s where our awkward conversation starts… It’s you. Contrary to every relationship breakup you’ve ever had, it’s definitely you. I’m not saying this to be mean but to really drive the point home here. The major reasons for this occurring are large blocking chains, too much workload for the server size (databases, users, etc.), and/or your virtual infrastructure sucks. There aren’t too many reasons for getting yourself into this situation, and while what I’ll be putting forth here isn’t exhaustive of all edge cases and scenarios, these are by far the majority of all the items in the wild that I’ve either worked on or have been involved in at some level. Side Note: If you’ve read this far, are shaking your head, calling me names that an irate sailor might utter, and telling yourself that upping the max worker threads as the product error suggests and Microsoft should fix their bugs then you can stop reading here as you’re probably not open to learning why you have issues in your environments.

One more scenario I’ve seen is mirroring thousands of databases on a single instance. That scenario fit none of Sean’s criteria—there was very little blocking, most of the databases were small and infrequently-used, and the infrastructure was the right size. It was just a huge number of databases and each database requiring a minimum of X worker threads. Mind you, it was still a bad idea…

Comments closed