The AlwaysOn_health event session in Extended Events is intended to make analyzing problems with Availability Groups possible after they have occurred. While this event session goes a long way towards making it possible to piece together the puzzle of what went wrong in a lot of situations, it can still be a difficult task. One of the things I wish Microsoft had included in the AlwaysON_health event session definition is the sqlserver.server_instance_name action for every event, and this is something that I usually recommend clients add to each of their AG servers using a script after I work with them the first time. Why would this be useful, since if the files come from a specific server we should know the events are for that server right? Well, when we are working with AG configurations with more than two servers and trying to see the big picture of what is happening across the servers, it can be difficult to follow timelines with multiple files from multiple servers open. It’s not impossible, but it does make things more difficult.
Click through to see how to do this through the UI or via T-SQL.
I’ve been hearing about round-robin read-only routing ever since SQL 2016 came out but whenever I tried to test if it’s working it never seemed to be. But now I know exactly how it works and there’s a few loopholes where it may not trigger, and they’re not the documented ones you’re thinking of.
To test the limits of it you’re going to need:
- PowerShell 5.1
- Pester 4 (
Install-Module Pester -Force)
- DbData (
Install-Module DbData -Force)
I’ll explain any of the Pester and DbData bits along the way so don’t worry. They’re minor framework stuff.
There’s some good stuff here around connection pooling, so check it out.
If you happen to be managing SQL Servers with a large number of databases and availability groups, it can sometimes be difficult to keep track of which database belongs to which availability group.
sp_WhatsMyAG will tell you just that. You can either provide it with the database name and it’ll tell you the AG that your specified database belongs to or you can leave the parameter NULL and you’ll get the AG of the database whose context you’re currently in.
Click through for the script.
Jim Donahoe shows off Failover Groups in Azure SQL Database. Part 1 involves setting up a Failover Group:
In my former company, we had 22 web applications that all had connections to various databases. We had all of our databases configured for Geo-Replication already, but still if we had to failover, we had to update each connection string for the web apps along with others which became a tedious process. In came Failover Groups to the rescue! With a Failover Group, I was able to create two endpoints that stayed the same no matter which server was primary/secondary. I liked to think of these as my Availability Group Listeners as they kinda serve the same functionality: Route traffic to a node depending on if its read-only or not. Best part? It’s configured through the Azure Portal SO EASILY! You can use PowerShell as well, but for this blog post, I will walk through the creation via the Portal. I will make a separate post or attach a script at some point for the PowerShell deployment.
Before we start the configuration portion of this though, let’s take a look at how Microsoft defines what a Failover Group is. I found this definition here: “Azure SQL Database auto-failover groups (in-preview) is a SQL Database feature designed to automatically manage geo-replication relationship, connectivity, and failover at scale.” Sounds pretty interesting, right? Let’s make one!
Well, now that the easy stuff is out of the way, let’s talk about how you connect to these groups via SSMS. This is where some of the confusion happens. When I first configured a Failover Group, the first thing I tried to do was connect to the Primary server via SSMS thinking it will work just like an Always On Listener in traditional SQL Server…NEWP!
If you’re running a production database on Azure SQL Database, you might want to look at Failover Groups.
The AlwaysOn Health Events were consistent with the SQL Server Logs. I discovered that the Windows Server Failover Cluster had been down (but was back online), and that the AlwaysOn Group had failed over and back. But the culprit to my secondary replica and databases being disconnected was an endpoint issue. I found in my SQL Server logs the service account running AlwaysOn had for some reason lost its permissions to connect to the endpoint:
Click through for the entire troubleshooting process as well as the solution.
The Latency data collection functionality and the associated reports allows a database administrator to quickly discern the bottleneck in the log transport flow between the Primary and the Secondary replicas of an Availability Group. This feature does NOT answer the question “Is there latency in the Availability Group deployment?” but rather provides a way to understand why there is latency in the Availability Group Deployment. This functionality provides a way to narrow down the potential cause of latency in an Availability Group deployment.
There are some things that this report doesn’t capture, but it does give us a bit more insight.
Monitoring can be tricky with distributed AGs because of the way it shows up in SSMS. The distributed AG created on the primary AG does not have the ability to show you a dashboard to monitor traffic like the typical AG and the secondary replicas do not even show the distributed AG. So, to monitor the distributed AG you are best to head to the DMVs. DMV sys.availability_groups has the is_distributed column that will allow you to only see the distributed AGs. Below is a query adapted from the Microsoft Docs query to add additional information to check on the status of the distributed AGs which you must run on the primary AG.
Click through for more details.
The function takes a replica name as input and queries system tables for Availability Groups running as secondary that are online, healthy, and synchronous. For each AG found, the function generates an ALTER AVAILABILITY GROUP statement. If the -noexec parm is set to 0, the command will be executed. If -noexec is set to 1, the command will be written out to a file.
When writing the function, I started out trying to use the native PowerShell Availability Group cmdlets. After several false starts, I found it easier to develop the T-SQL code in Management Studio and use Invoke-Sqlcmd to execute the code. The code is available below. I hope you can put it to use.
Click through for the script.
Read-Scale availability groups are ones where we don’t want the availability group for high-availability or disaster recovery, instead, we want to use it to create multiple copies of our databases that span across multiple servers allowing for the spreading of a large read-only workload. There are various scenarios where this might be extremely valuable and in previous versions of SQL Server it was possible, though there was a requirement of using Windows Server Failover Clustering (WSFC). Read-Scale availability groups do not require the WSFC component and does not give high-availability or disaster recovery, it only acts as a mechanism (availability groups) to facilitate the synchronization of the databases across multiple servers.
To reiterate, this is not used for high-availability or disaster recovery but instead to scale your databases across multiple servers for read workloads.
The remainder of the post shows how to set up an Availability Group without the corresponding Windows Server Failover Clustering components.
Starting in Windows Server 2012 R2 you now have a way to upgrade a cluster to Windows 2016. The best part is it’s not an OS upgrade, but a rebuild. The magic is that you can join a Windows 2016 server to a Windows 2012 R2 cluster. You can upgrade your cluster with as little as one failover and thus very little down time. Everything stays in compatibility mode until all nodes are upgraded to Windows 2016 and then you upgrade the cluster functional level. This is great news for those of us running FCIs or AGs.
Click through for a listing of steps and a video.