Press "Enter" to skip to content

Category: Availability Groups

Operating System Error 995 on Adding a Database to an AG

Andrew Pruski troubleshoots a problem:

I was adding databases to an availability group (SQL Server 2017 CU20 instance) the other day and one database failed to automatically seed to the secondary.

When I looked in the SQL Server error log I saw this error message: –

BackupIoRequest::ReportIoError: write failure on backup device ‘{GUID}’. Operating system error 995(The I/O operation has been aborted because of either a thread exit or an application request.).

Read on to see how Andrew solved the problem.

Comments closed

HADR_SYNC_COMMIT

Sean Gallardy lays out what HADR_SYNC_COMMIT really tells you:

Initially I thought to myself, “this is the most misunderstood wait type that exists in the HA space for SQL Server”, then I realized maybe this isn’t the case… So, I pondered over this question, “is it truly misunderstood?” and came to the (possibly incorrect) realization that it is quite accurate in the general SQL Server’s users’ space of understanding. I also concluded that, really, it’s the way the wait is used in SQL Server coupled with how waits work in SQL Server, which leads to how it is viewed. Let me explain….

You’ll definitely want to read Sean’s explanation.

Comments closed

Creating a Distributed Availability Group in Azure via Terraform

Sandeep Arora has some scripts for us:

To create a distributed availability group, you need two availability groups (AG) each with its own listener, which you then combine.In this case, one availability group is on-premises and the other needs to be created in Microsoft Azure. This example doesn’t cover all of the details like creating an extended network setup between on-premises network and Azure or joining Azure active directory domain services to and on-premises forest; instead, it highlights the key requirements for setting up the availability group in Azure and then configuring the distributed AG between the on-premises availability group (represented as AOAG-1) and the Azure availability group (represented as AOAG-2).

Click through for the preparations you need in place and a set of scripts to do the work.

Comments closed

Against sp_hexadecimal and sp_help_revlogin

Andy Mallon says it’s time to give up a couple of procedures:

We recently ran into some performance problems with our login sync, which is based on sp_hexadecimal and sp_help_revlogin, the documented & recommended approach by Microsoft.

I’ve been installing & using these two procedures since I started working with SQL Server, back at the turn of the century. In the nearly two decades since, I’ve blindly installed & used these procedures, first on SQL Server 2000, and then on every version since… just because that’s the way I’ve always done it. But our recent performance problems made me rethink that, and dive in to take a look at the two procedures to see if I could do better, which made me realize, OHBOY! WE CAN DO BETTER!!

Read on to understand how.

Comments closed

Maximizing Availability Group Performance

Jonathan Kehayias has a few tips for improving performance of your Availability Groups:

Since Microsoft first introduced the Always On Availability Groups (AGs) feature in SQL Server 2012, there’s been a lot of interest in using AGs for both high availability and disaster recovery (HADR), as well as for offloading read-only workloads. The combination of the best features for failover clustering, the simplicity of data movement and synchronization from database mirroring, and the ability to offload read-only workloads to secondaries has given businesses a compelling reason to upgrade to leverage AGs.

But, as the saying goes, there’s no such thing as a free lunch, and there are several performance implications and considerations you must be aware of to have a successful deployment using AGs. This blog post will explore some of the considerations and look at how to plan, architect, and implement an AG with minimal latency and performance impact on the production workload.

Click through for those tips.

Comments closed

Automate Availability Group Failover for SSISDB 2012 and 2014

Alex Stuart shows how to fail over SSISDB in SQL Server 2012 or 2014:

Hopefully not many people are still configuring SSIS instances on SQL 2012 or 2014 – especially HA instances – but if you are, this post is for you.

If you’re running SQL Server 2016 or above, having the SSIS catalog function correctly in an AG is supported by built-in functionality to manage the DMK (database master key). In 2012/2014 however there is no such support. Without intervention this makes SSISDB unable to be opened after a failover, because the DMK isn’t open – leading to errors such as “Please create a master key in the database or open the master key in the session before performing this operation.

Read on to see how to resolve this error, and then how to do this automatically.

Comments closed

Availability Groups and Logins

Andrea Allred runs into a post-failover issue:

While doing a planned Availability Group failover, the application stopped talking to the database. After checking the SQL Server log, we found that all the SQL Logins were failing with an “incorrect password” error. The logins were on the server, the users were in the databases, and the passwords were even right, so what was wrong? It all comes down to SID’s (Security Identifiers).

Read on for the cause and the solution. I’d also recommend Sync-DbaAvailabilityGroup as a good dbatools cmdlet to use.

Comments closed

Understanding Long Failover Times for Availability Groups

Sean Gallardy has answers to your Availability Group questions, as long as you ask the specific question in this post:

One of the most common issues I look at from day to day is some variation of the question. “Why did it take a long time for my AG/Database to failover?”. There are many different meanings for this innocuously simple looking statement, for example was it that the failover time was long or was it a long time bringing the database online, or was it that it took a long time because a failover wasn’t possible, and what *exactly* is a long time? Are we talking a long time means 10 seconds, 1 minute, 5 minutes, 30 minutes? To each different business and their needs, “long” dramatically fluctuates. I’d like to go through at a high level, some of the most common reasons that I troubleshoot and if they might apply to your environment. FYI, if you tell me 1 second is a long time then I’m going to point you toward different architectures with multiple layers of caches and front-end servers/services which isn’t going to be cheap, but that’s what you want so you’re _willing_ to pay for it, right? Yeah, I thought not.

Click through for several factors which may affect how long it takes for a failover to occur.

Comments closed