Jim Donahoe shows off Failover Groups in Azure SQL Database. Part 1 involves setting up a Failover Group:
In my former company, we had 22 web applications that all had connections to various databases. We had all of our databases configured for Geo-Replication already, but still if we had to failover, we had to update each connection string for the web apps along with others which became a tedious process. In came Failover Groups to the rescue! With a Failover Group, I was able to create two endpoints that stayed the same no matter which server was primary/secondary. I liked to think of these as my Availability Group Listeners as they kinda serve the same functionality: Route traffic to a node depending on if its read-only or not. Best part? It’s configured through the Azure Portal SO EASILY! You can use PowerShell as well, but for this blog post, I will walk through the creation via the Portal. I will make a separate post or attach a script at some point for the PowerShell deployment.
Before we start the configuration portion of this though, let’s take a look at how Microsoft defines what a Failover Group is. I found this definition here: “Azure SQL Database auto-failover groups (in-preview) is a SQL Database feature designed to automatically manage geo-replication relationship, connectivity, and failover at scale.” Sounds pretty interesting, right? Let’s make one!
Well, now that the easy stuff is out of the way, let’s talk about how you connect to these groups via SSMS. This is where some of the confusion happens. When I first configured a Failover Group, the first thing I tried to do was connect to the Primary server via SSMS thinking it will work just like an Always On Listener in traditional SQL Server…NEWP!
If you’re running a production database on Azure SQL Database, you might want to look at Failover Groups.
The AlwaysOn Health Events were consistent with the SQL Server Logs. I discovered that the Windows Server Failover Cluster had been down (but was back online), and that the AlwaysOn Group had failed over and back. But the culprit to my secondary replica and databases being disconnected was an endpoint issue. I found in my SQL Server logs the service account running AlwaysOn had for some reason lost its permissions to connect to the endpoint:
Click through for the entire troubleshooting process as well as the solution.
The Latency data collection functionality and the associated reports allows a database administrator to quickly discern the bottleneck in the log transport flow between the Primary and the Secondary replicas of an Availability Group. This feature does NOT answer the question “Is there latency in the Availability Group deployment?” but rather provides a way to understand why there is latency in the Availability Group Deployment. This functionality provides a way to narrow down the potential cause of latency in an Availability Group deployment.
There are some things that this report doesn’t capture, but it does give us a bit more insight.
Monitoring can be tricky with distributed AGs because of the way it shows up in SSMS. The distributed AG created on the primary AG does not have the ability to show you a dashboard to monitor traffic like the typical AG and the secondary replicas do not even show the distributed AG. So, to monitor the distributed AG you are best to head to the DMVs. DMV sys.availability_groups has the is_distributed column that will allow you to only see the distributed AGs. Below is a query adapted from the Microsoft Docs query to add additional information to check on the status of the distributed AGs which you must run on the primary AG.
Click through for more details.
The function takes a replica name as input and queries system tables for Availability Groups running as secondary that are online, healthy, and synchronous. For each AG found, the function generates an ALTER AVAILABILITY GROUP statement. If the -noexec parm is set to 0, the command will be executed. If -noexec is set to 1, the command will be written out to a file.
When writing the function, I started out trying to use the native PowerShell Availability Group cmdlets. After several false starts, I found it easier to develop the T-SQL code in Management Studio and use Invoke-Sqlcmd to execute the code. The code is available below. I hope you can put it to use.
Click through for the script.
Read-Scale availability groups are ones where we don’t want the availability group for high-availability or disaster recovery, instead, we want to use it to create multiple copies of our databases that span across multiple servers allowing for the spreading of a large read-only workload. There are various scenarios where this might be extremely valuable and in previous versions of SQL Server it was possible, though there was a requirement of using Windows Server Failover Clustering (WSFC). Read-Scale availability groups do not require the WSFC component and does not give high-availability or disaster recovery, it only acts as a mechanism (availability groups) to facilitate the synchronization of the databases across multiple servers.
To reiterate, this is not used for high-availability or disaster recovery but instead to scale your databases across multiple servers for read workloads.
The remainder of the post shows how to set up an Availability Group without the corresponding Windows Server Failover Clustering components.
Starting in Windows Server 2012 R2 you now have a way to upgrade a cluster to Windows 2016. The best part is it’s not an OS upgrade, but a rebuild. The magic is that you can join a Windows 2016 server to a Windows 2012 R2 cluster. You can upgrade your cluster with as little as one failover and thus very little down time. Everything stays in compatibility mode until all nodes are upgraded to Windows 2016 and then you upgrade the cluster functional level. This is great news for those of us running FCIs or AGs.
Click through for a listing of steps and a video.
On the plane home from PAS Summit I was sat next to someone who had also attended and when he saw on my laptop that I was part of the SQL Community we struck up a conversation. He asked me how he could compare SQL Agent Jobs across availability group replicas to ensure that they were the same.
He already knew that he could use Copy-DbaAgentJob from dbatools to copy the jobs between replicas and we discussed how to set up an Agent job to accomplish this. The best way to run an Agent Job with a PowerShell script is described here
Maybe I’m out in la-la land, but wouldn’t this be a perfect use case for MSX/TSX?
Q: Can you enable Query Store for a read-only replica?
A: No. Because the replica is read-only, and Query Store inherently writes data TO the database, you cannot enable it to capture queries that are executed against that read-only copy. I did create a Connect item for this request. If this is of interest to you, please up-vote it: Enable Query Store for collection on a read-only replica in an Availability Group. The more votes this has, the better the possibility that Microsoft will implement it, so feel free to share with your friends and have them vote too!
Read on for more questions and answers, and if you’re interested in it, vote on the Connect item above.
I recently had to modify the Endpoint URLs on our SQL Server Availability Group replicas.
The reason for this blog post is that I could not answer the following questions:
Do I need to suspend data movement prior to making this change?
Would this change require a restart of the database instance?
Now Ronald can, and after reading his post, you’ll be able to as well.