Press "Enter" to skip to content

Category: HA / DR

Data Inconsistency in Postgres HA Clusters

Umair Shahid gives us an overview:

While PostgreSQL is known for its robustness, scalability, and reliability, data inconsistency can occur in PostgreSQL clusters, which can cause issues and impact the overall performance of the system. In this blog, we’ll define data inconsistency in PostgreSQL clusters, discuss the challenges it poses, its causes, and provide some tips on how to prevent and resolve it if it occurs.

Click through for the article.

Comments closed

False Alarms in Highly Available Postgres Clusters

Umair Shahid pulls the alarm:

False alarms can be a significant problem in highly available clusters of PostgreSQL. They can cause unnecessary downtime and disruptions that can impact the performance of the nodes. In this blog post, we will explore the causes, prevention, and resolution of false alarms in PostgreSQL clusters.

It’s a good idea to sit back and think about how complex the problem of high availability is, even if the service (SQL Server, Postgres, or whatever) offers capabilities to simplify a lot of it. The trick is that you want your service to fail over if and only if it needs to, but what tells you if it “needs to” is noisy signal.

Comments closed

Creating a Disaster Recovery Plan for Synapse

Freddie Santos talks HA/DR with Synapse:

Many of our customers have been asking about creating a disaster recovery plan for their Synapse Workspace. In a new blog series, we will cover the basics of disaster recovery and business continuity, discussing available options and custom solutions.

In this first post, we’ll review important concepts and questions to answer before building a disaster recovery plan, including the differences between High Availability and Disaster Recovery.

The focus in this post is on the dedicated SQL pool and Azure Data Lake Storage Gen2 (because people still think about Gen1?), though that’s the majority of what you’d need to think about—Spark pools and the serverless SQL pool really drive from the data lake. There’s also Data Explorer pools, which have their own storage and HA/DR capabilities.

Comments closed

Business Continuity with Arc-Enabled Data Services

Warwick Rudd continues a series on Azure Arc-Enabled Data Services. Part 11 covers high availability:

So far in this series of posts, you have been able to deploy and configure your newly provisioned Azure Arc-enabled SQL MI environment. Out of the box you get High Availability without having to do or implement anything.

The Recovery Time Objective (RTO) that is achievable with Azure Arc-enabled Data Services is dependent on the tier you choose to deploy. But regardless of that, this post is only concerned about informing you what you get out of the box with this technology.

Part 12 turns to disaster recovery:

In the previous post, we introduced you to how Azure Arc-enabled SQL MI provides High Availability based on the tier you have deployed.  If your environment requires disaster recovery, regardless of the tier level you have deployed, Azure Arc-enabled Data Services covers the job for you.

Read on to learn more about what options are available and what you need to do.

Comments closed

Portworx and Kubernetes Storage Failover

Andrew Pruski digs into a problem:

In a nutshell, the issue is that the attachdetach-controller in Kubernetes won’t detach storage from an offline node until that node is either brought back online or is removed from the cluster. What this means is that a pod spinning up on a new node that requires that storage can’t come online.

Aka, if you’re running SQL Server in Kubernetes and a node fails, SQL won’t be able to come back online until someone manually brings the node online or deletes the node.

Not great tbh, and it’s been a blocker for my PoC testing.

However, there are ways around this…one of them is by a product called Portworx which I’m going to demo here.

After a short disclaimer, there’s plenty of good content.

Comments closed

Attaching All SQL Server Data Files in a Single Directory

David Fowler migrated a bunch of databases:

Have you ever had the need to attach a large number of database in one go? There’s no way to attach multiple databases in SSMS or via script, so you’re probably going to be left with the slow, arduous task of doing them one by one.

I recently had to deal with a DR situation (I won’t go into details of what happened just yet as things are still quite sensitive, but I might look at it at some point in the future) where I faced exactly that issue. For one reason or another I needed to attach several hundred databases quickly. I didn’t fancy doing that via SSMS or script each one individually so I knocked together this script to do the job for me.

Click through for that script and instructions. Alternatively, a bit of Powershell and the right dbatools command could get you to the same result but this is good in the event that you can’t leave SSMS.

Comments closed

High Availability in SQL Managed Instance General Purpose Tier

Niko Neugebauer clears up what options you have for high availability in SQL MI’s General Purpose tier:

The two main requirements around high availability are commonly known as RTO and RPO.


RTO
 – stands for Recovery Time Objective and is the maximum allowable downtime when a failure occurs. In other words, how much time it takes for your databases to be up and running.


RPO
 – stands for Recovery Point Objective and is the maximum allowable data-loss when a failure occurs. Of course, the ideal scenario is not to lose any data, but a more realistic (and also ideal) scenario is to not lose any committed data, also known as Zero Committed Data Loss.

With those definitions out of the way, read on to learn more.

Comments closed

Database Mirroring Compatibility and Availability Groups

Sean Gallardy checks out the past:

Around 2005, mirroring was born. It was an evolution on log shipping, which is taking log backups, moving them around, and restoring them all in an automated fashion to different servers. Mirroring upped that game and created a dedicated network channel between servers (you could only have 1 principle and 1 mirror, so 2 total) so that there wasn’t this funny business of copying and restoring, additionally it allowed the mirror server to be a highly available copy with automatic failover. Since Microsoft marketing is terrible at naming things, it was originally called, “Real Time Log Shipping” which was then changed to “Mirroring” and in typical fashion you can find the unofficial “Real Time Log Shipping” name all over the place where it was never updated. (I can’t really blame them here, though, it’s hard to find all the little places you’re putting this moniker in and then having some other team tell you to change it all at some way later point)

Read the whole thing. It’s a fun read, a little sad, and helps us understand a bit of availability group behavior which might bite the unaware. I will definitely defend Microsoft’s backward-compatibility emphasis. This makes life so much easier for developers than a lot of other languages and environments. In the R and Python worlds, breaking changes are the norm, meaning that when you update packages, you can expect something to break and now that “20-minute” package upgrade ticket becomes 3 days of trying to sort out what went wrong.

Comments closed

Auto-Failover Groups for Azure SQL Hyperscale

Melody Zacharias fills us in on a recent announcement:

On January 5th they announced, auto-failover groups for Azure SQL Hyperscale are now available in preview. Auto-failover groups is a feature that allows you to manage the failover and replication of a group of databases on a server or managed instance from one region to another region in Azure. This can be done manually or in conjunction with a user-defined policy. 

Click through for more information on how it all works.

Comments closed

Managed Instance Failover Groups

Arun Sirpal takes us through Azure SQL Managed Instance failover groups:

If you have been following me for a while you will know that I really like the Fail over groups within Azure SQL DB and it is no different to when applying it to Managed Instances. If you want a rock-solid DR plan, this is the way forward.

Remember it’s an abstraction layer on top of the active geo-replication feature, before this we had to do a lot of manual one to one database setups but now this feature simplifies deployment and management of geo-replicated databases at scale. You can initiate failover manually or automatically if there is a massive failure (researching this topic this could mean things from memory leaks to wrong network cables cut during routine hardware decommissioning – you never know, it could happen so plan for it)

Click through to see how to set this up and what failover looks like.

Comments closed