The AlwaysOn Health Events were consistent with the SQL Server Logs. I discovered that the Windows Server Failover Cluster had been down (but was back online), and that the AlwaysOn Group had failed over and back. But the culprit to my secondary replica and databases being disconnected was an endpoint issue. I found in my SQL Server logs the service account running AlwaysOn had for some reason lost its permissions to connect to the endpoint:
Click through for the entire troubleshooting process as well as the solution.
The Latency data collection functionality and the associated reports allows a database administrator to quickly discern the bottleneck in the log transport flow between the Primary and the Secondary replicas of an Availability Group. This feature does NOT answer the question “Is there latency in the Availability Group deployment?” but rather provides a way to understand why there is latency in the Availability Group Deployment. This functionality provides a way to narrow down the potential cause of latency in an Availability Group deployment.
There are some things that this report doesn’t capture, but it does give us a bit more insight.
Monitoring can be tricky with distributed AGs because of the way it shows up in SSMS. The distributed AG created on the primary AG does not have the ability to show you a dashboard to monitor traffic like the typical AG and the secondary replicas do not even show the distributed AG. So, to monitor the distributed AG you are best to head to the DMVs. DMV sys.availability_groups has the is_distributed column that will allow you to only see the distributed AGs. Below is a query adapted from the Microsoft Docs query to add additional information to check on the status of the distributed AGs which you must run on the primary AG.
Click through for more details.
The function takes a replica name as input and queries system tables for Availability Groups running as secondary that are online, healthy, and synchronous. For each AG found, the function generates an ALTER AVAILABILITY GROUP statement. If the -noexec parm is set to 0, the command will be executed. If -noexec is set to 1, the command will be written out to a file.
When writing the function, I started out trying to use the native PowerShell Availability Group cmdlets. After several false starts, I found it easier to develop the T-SQL code in Management Studio and use Invoke-Sqlcmd to execute the code. The code is available below. I hope you can put it to use.
Click through for the script.
Read-Scale availability groups are ones where we don’t want the availability group for high-availability or disaster recovery, instead, we want to use it to create multiple copies of our databases that span across multiple servers allowing for the spreading of a large read-only workload. There are various scenarios where this might be extremely valuable and in previous versions of SQL Server it was possible, though there was a requirement of using Windows Server Failover Clustering (WSFC). Read-Scale availability groups do not require the WSFC component and does not give high-availability or disaster recovery, it only acts as a mechanism (availability groups) to facilitate the synchronization of the databases across multiple servers.
To reiterate, this is not used for high-availability or disaster recovery but instead to scale your databases across multiple servers for read workloads.
The remainder of the post shows how to set up an Availability Group without the corresponding Windows Server Failover Clustering components.
Starting in Windows Server 2012 R2 you now have a way to upgrade a cluster to Windows 2016. The best part is it’s not an OS upgrade, but a rebuild. The magic is that you can join a Windows 2016 server to a Windows 2012 R2 cluster. You can upgrade your cluster with as little as one failover and thus very little down time. Everything stays in compatibility mode until all nodes are upgraded to Windows 2016 and then you upgrade the cluster functional level. This is great news for those of us running FCIs or AGs.
Click through for a listing of steps and a video.
On the plane home from PAS Summit I was sat next to someone who had also attended and when he saw on my laptop that I was part of the SQL Community we struck up a conversation. He asked me how he could compare SQL Agent Jobs across availability group replicas to ensure that they were the same.
He already knew that he could use Copy-DbaAgentJob from dbatools to copy the jobs between replicas and we discussed how to set up an Agent job to accomplish this. The best way to run an Agent Job with a PowerShell script is described here
Maybe I’m out in la-la land, but wouldn’t this be a perfect use case for MSX/TSX?
Q: Can you enable Query Store for a read-only replica?
A: No. Because the replica is read-only, and Query Store inherently writes data TO the database, you cannot enable it to capture queries that are executed against that read-only copy. I did create a Connect item for this request. If this is of interest to you, please up-vote it: Enable Query Store for collection on a read-only replica in an Availability Group. The more votes this has, the better the possibility that Microsoft will implement it, so feel free to share with your friends and have them vote too!
Read on for more questions and answers, and if you’re interested in it, vote on the Connect item above.
I recently had to modify the Endpoint URLs on our SQL Server Availability Group replicas.
The reason for this blog post is that I could not answer the following questions:
Do I need to suspend data movement prior to making this change?
Would this change require a restart of the database instance?
Now Ronald can, and after reading his post, you’ll be able to as well.
I’m a big fan of using synonyms to point to objects in other databases, but if you’ve tried to use them in conjunction with Availability Groups you may have encountered a slight issue: upon failover, the synonyms are no longer valid because the instance name is part of the synonym definition. In order for the synonyms to work again, they have to point to the instance they now reside in – assuming the databases were in the same instance, respectively.The next challenge was to automate the of detection of when the primary has changed, therefore triggering an update of the synonyms. I put the following tSQL into a SQL Agent job and set it to run every five seconds.
Click through for the SQL Agent job details. I have only created a couple synonyms and don’t really like them that much, but if they’re your bag, then at least you know you can use them safely with Availability Groups.