HA / DR – Curated SQL

Zone Redundancy in Azure SQL Managed Instance

Published 2025-07-17 by Kevin Feasel

Arun Sirpal explains what zone redundancy is in Azure:

Do you know what happens when you enable zonal redundancy for your SQL managed instance?

Lets define it first (in the context of Business-Critical tier) – zonal redundancy is achieved by placing compute and storage replicas in different availability zones (3) and then using underlying Always On availability group to replicate data changes from the primary instance to standby replicas in other availability zones.

Availability zones are in the same Azure region, so it works well for high availability but isn’t as good for disaster recovery: if an entire region goes down, zone redundancy won’t help you very much. Also, be aware that you’re paying for what’s running in those three zones because TANSTAAFL.

High Availability Architecture for PostgreSQL

Published 2025-07-03 by Kevin Feasel

Umair Shahid adds a 9:

Most teams building production applications understand that “uptime” matters. I am writing this blog to demonstrate how much difference an extra 0.09% makes.

At 99.9% availability, your system can be down for over 43 minutes every month. At 99.99%, that window drops to just over 4 minutes. If your product is critical to business operations, customer workflows, or revenue generation, those 39 extra minutes of downtime each month can be the difference between trust and churn.

Click through for some of the tools and practices that can help get you there in PostgreSQL.

Using Barman to Back Up HA-Enabled PostgreSQL Clusters

Published 2025-07-01 by Kevin Feasel

Semab Tariq reminds us that high availability is not disaster recovery:

Barman is a popular tool in the PostgreSQL ecosystem for managing backups, especially in High Availability (HA) environments. It’s known for being easy to set up and for offering multiple types and modes of backups. However, this flexibility can also be a bit overwhelming at first. That’s why I’m writing this blog to break down each backup option in a simple and clear way, so you can choose the one that best fits your business needs.

Click through for the available options, as well as some recommendations.

Choosing a High Availability Solution in PostgreSQL

Published 2025-06-24 by Kevin Feasel

Semab Tariq compares two alternatives:

When designing a highly available PostgreSQL cluster, two popular tools often come into the conversation: Pgpool-II and Patroni. Both are widely used in production environments, offer solid performance, and aim to improve resilience and reduce downtime; however, they take different approaches to achieving this goal.

We often get questions during webinars/talks and customer calls about which tool is better suited for production deployments. So, we decided to put together this blog to help you understand the differences and guide you in choosing the right solution based on your specific use case.

Click through for a primer on the topic, followed by some recommendations.

Comments closed

Split-Brain Scenarios in PostgreSQL Clusters

Published 2025-06-02 by Kevin Feasel

Semab Tariq knows that an application cannot serve two masters:

In this blog post, we will try to explore a critical failure condition known as a split-brain scenario that can occur in PostgreSQL HA clusters. We will first see what split-brain means, and then how it can impact PostgreSQL clusters, and finally discuss how to prevent it through architectural choices and tools available in the PostgreSQL ecosystem

Click through for an explanation of split-brain and what can cause this problem. Additionally, Semab includes several tips on how to limit the likelihood of a split-brain scenario occurring.

Comments closed

HA/DR in Oracle with Data Guard

Published 2025-05-30 by Kevin Feasel

Kellyn Gorman takes a peek at Oracle Data Guard:

In its traditional, (and free) configuration, Oracle Data Guard operates in an active/passive architecture. This incredibly well-designed and valuable solution from Oracle which comes included with the Enterprise Edition has as part of its architecture:

A primary database, which is an active, accessible database system.

One or more standby databases, which are passive replicas that continuously receive redo data from the primary.

Click through for an overview of the product.

Comments closed

Database Snapshots in High-Availability Setups

Published 2025-05-14 by Kevin Feasel

Stephen Planck adds one more layer of complexity:

SQL Server’s database-snapshot feature is a wonderfully simple tool: at the instant you create the snapshot, every page in the database is marked “copy-on-write.” Nothing is copied across the wire, no blocking locks appear, and the snapshot opens immediately as a read-only database on the local replica. Queries against the snapshot see the world exactly as it looked at that moment while the live workload keeps changing pages in the primary data files. Because snapshots live only in sparse files on the server that owns them, they are not a replacement for backups—but they are perfect for ad-hoc reporting, quick “before-and-after” comparisons, or a safety net when you want an easy way to back out a risky change that should finish within minutes or hours.

But read on to see how they interact with high-availability features such as transactional replication and availability groups.

Comments closed

Failover Groups in Azure SQL Database

Published 2025-02-21 by Kevin Feasel

Mika Sutinen looks at some interesting functionality:

One of the interesting features in Azure SQL Database is the Failover Groups. It allows you to manage replication of an Azure SQL database, or group of databases, to another logical server. The reason I’ve bolded the manage replication is, that the replication itself is handled by active geo-replication, which is also a feature of Azure SQL Database.

Read on to see how these are different and why you might want to use failover groups.

Comments closed

Using Kubernetes with Distributed Availability Groups

Published 2024-11-19 by Kevin Feasel

Andrew Pruski has a guide for us:

A while back I wrote about how to use a Cross Platform (or Clusterless) Availability Group to seed a database from a Windows SQL instance into a pod in Kubernetes.

I was talking with a colleague last week and they asked, “What if the existing Windows instance is already in an Availability Group?”

This is a fair question, as it’s fairly rare (in my experience) to run a standalone SQL instance in production…most instances are in some form of HA setup, be it a Failover Cluster Instance or an Availability Group.

Read on for the tutorial. There are quite a few steps involved.

Comments closed

Cross-Regional Failover Clusters in Google Cloud Platform

Published 2024-10-31 by Kevin Feasel

Dave Bermingham builds a cluster:

I was the principal author of this SIOS whitepaper, which describes how to build a 2-node SQL Server cluster in Google Cloud Platform (GCP) spanning multiple zones. Today, I’ll explain how to extend this cluster by adding a third node in a different GCP region.

Check out the paper and then Dave’s step-by-step instructions.

Comments closed

Category: HA / DR