Press "Enter" to skip to content

Category: HA / DR

Forced Quorum Failures with WSFC

Eitan Blumin can’t reach quorum:

The incident started with a late-night phone call from one of our customers (it’s always a late-night phone call, isn’t it?).

They reported that during a DR exercise on their production environment (Chaos Engineering, anyone?) their entire cluster failed and they weren’t able to bring any of the replicas back online.

Click through for the full story, including what happened, why it happened, and what you can do to prevent similar problems in the future.

Comments closed

SQL Server Failover Clusters in Linux

I phone it in:

In this video, we will talk about Failover Cluster Instances in SQL Server on Linux.

This video stays in the academic realm because I don’t have an enterprise version of Linux (either RHEL or SLES) and don’t have a SAN or NAS, so I couldn’t actually show any of it off. Still, somehow I turned the utter lack of demo into almost a 20-minute video.

Comments closed

Backups Are for DR, Not HA

Kevin Hill gives us a poignant reminder:

Please continue doing your backups!

Backups are Disaster Recovery, yes…but not HA.

Some will argue with this (in the comments most likely), but I broadly define “High Availability” as a system that can recover in seconds or minutes at most. Sometimes that is automatic, sometimes manual.

I agree that backups are for DR, not HA. I’d consider log shipping an option for both HA and DR, albeit one that requires manual failover (or rigging up a script that performs the failover for you).

I disagree about replication as an HA solution. Yes, you do need to make sure that everything can replicate, but if your publisher goes down, the subscriber can continue and your data is still available for use. And if you’re a complete masochist, you can use merge replication to allow writes to continue while the publisher is down. Cleaning up after that is a mess, especially if you end up with a bunch of conflicts, but High Availability doesn’t mean Easy Mode.

Comments closed

Hybrid Failover Rights from SQL Server 2022 to Azure SQL MI

Dani Ljepava explains a new benefit:

Hybrid failover rights is a new benefit that allows you to run a license-free Azure SQL Managed Instance when used as a passive DR replica for your SQL Server 2022 licensed under Software Assurance (SA), or using Pay-as-you-go billing option.

How the Hybrid Failover Rights benefit works

The new Hybrid failover rights licensing benefit is technology agnostic. You can use any technology, such is MI link as the most advanced replication technology using Always On, or perhaps LRS, ADF, transactional replication, backup and restore, or similar to setup replication between SQL Server and Managed Instance. As long as you are using Azure SQL Managed Instance only as a passive replica for your SQL Server 2022, you are eligible to apply the new licensing benefit.

Read on for more details on how you can activate this benefit.

Comments closed

Auto-Failover Groups in Azure SQL DB

Etienne Lopes wraps up a series:

So, first of all, what is Auto-failover groups?

The auto-failover groups feature allows you to manage the replication and failover of databases to another Azure region. You can include of a group of databases or all user databases in a logical server to be replicated to another logical server. It is a declarative abstraction on top of the active geo-replication feature, designed to simplify deployment and management of geo-replicated databases at scale.

Read on to see some of the benefits of this, as well as how to enable it.

Comments closed

Oracle: RMAN and Non-Synchronizing Standby Database

David Fitzjarrell proffers advice on recovering from a non-synchronizing standby database:

Occasionally the unthinkable can occur and the DBA can be left with a standby database that is no longer synchronizing with the primary. A plethora of “advice”will soon follow that discovery, most of it much like this:

“Well, ya gotta rebuild it.”

Of course the question to ask is “how far out of synch is the standby>” That question is key in determining how to attack this situation. Let’s go through the two most common occurrences of this and see how to address them.

Read on to see David’s advice.

Comments closed

Service Level Agreements (RPO and RTO) and SQL Server

David Klee wants to know how much downtime is acceptable to you:

Database professionals of the world – I have a question. Has your organization defined service level agreements (SLAs) for your data estate? I’m talking specifically the Recovery Point Objective (RPO) and Recovery Time Objective (RTO), and to have these defined not in an arbitrary number of nines, but in minutes or hours. If these aren’t defined from above, your business continuity plan is doomed to fail.

Read on to learn what RPO and RTO mean, how to think in terms of RPO and RTO, and some of David’s recommendations.

Comments closed

Trying out Azure Geo-Replication

Etienne Lopes continues a series on Azure SQL DB HA/DR:

So, first of all, what is Active Geo-Replication?

Active geo-replication is a feature that lets you create a continuously synchronized readable secondary database for a primary database. The readable secondary database may be in the same Azure region as the primary, or, more commonly, in a different region. This kind of readable secondary database is also known as a geo-secondary or geo-replica.“

Read on to learn more about the topic, including how to set it up and ways to try it out.

Comments closed

Three-Node Postgres HD Cluster with pg_cirrus

Salman Ahmed wants to be highly available:

We are thrilled to announce the release of pg_cirrus! First of all, you might be wondering what “cirrus” means. The term refers to the thin and wispy clouds that are often seen at high altitudes.

pg_cirrus is a simple and automated solution to deploy highly available 3-node PostgreSQL clusters with auto failover. It is built using Ansible and to perform auto failover and load balancing we are using pgpool.

Read on to see how it works. It’s also licensed under GPLv3, so it’s not only highly available but also freely available.

Comments closed

Data Inconsistency in Postgres HA Clusters

Umair Shahid gives us an overview:

While PostgreSQL is known for its robustness, scalability, and reliability, data inconsistency can occur in PostgreSQL clusters, which can cause issues and impact the overall performance of the system. In this blog, we’ll define data inconsistency in PostgreSQL clusters, discuss the challenges it poses, its causes, and provide some tips on how to prevent and resolve it if it occurs.

Click through for the article.

Comments closed