Category: HA / DR

The two main requirements around high availability are commonly known as RTO and RPO.

RTO – stands for Recovery Time Objective and is the maximum allowable downtime when a failure occurs. In other words, how much time it takes for your databases to be up and running.

RPO – stands for Recovery Point Objective and is the maximum allowable data-loss when a failure occurs. Of course, the ideal scenario is not to lose any data, but a more realistic (and also ideal) scenario is to not lose any committed data, also known as Zero Committed Data Loss.

With those definitions out of the way, read on to learn more.

Comments closed

Database Mirroring Compatibility and Availability Groups

Published 2022-01-24 by Kevin Feasel

Sean Gallardy checks out the past:

Around 2005, mirroring was born. It was an evolution on log shipping, which is taking log backups, moving them around, and restoring them all in an automated fashion to different servers. Mirroring upped that game and created a dedicated network channel between servers (you could only have 1 principle and 1 mirror, so 2 total) so that there wasn’t this funny business of copying and restoring, additionally it allowed the mirror server to be a highly available copy with automatic failover. Since Microsoft marketing is terrible at naming things, it was originally called, “Real Time Log Shipping” which was then changed to “Mirroring” and in typical fashion you can find the unofficial “Real Time Log Shipping” name all over the place where it was never updated. (I can’t really blame them here, though, it’s hard to find all the little places you’re putting this moniker in and then having some other team tell you to change it all at some way later point)

Read the whole thing. It’s a fun read, a little sad, and helps us understand a bit of availability group behavior which might bite the unaware. I will definitely defend Microsoft’s backward-compatibility emphasis. This makes life so much easier for developers than a lot of other languages and environments. In the R and Python worlds, breaking changes are the norm, meaning that when you update packages, you can expect something to break and now that “20-minute” package upgrade ticket becomes 3 days of trying to sort out what went wrong.

Comments closed

Auto-Failover Groups for Azure SQL Hyperscale

Published 2022-01-18 by Kevin Feasel

Melody Zacharias fills us in on a recent announcement:

On January 5th they announced, auto-failover groups for Azure SQL Hyperscale are now available in preview. Auto-failover groups is a feature that allows you to manage the failover and replication of a group of databases on a server or managed instance from one region to another region in Azure. This can be done manually or in conjunction with a user-defined policy.

Click through for more information on how it all works.

Comments closed

Managed Instance Failover Groups

Published 2021-11-23 by Kevin Feasel

Arun Sirpal takes us through Azure SQL Managed Instance failover groups:

If you have been following me for a while you will know that I really like the Fail over groups within Azure SQL DB and it is no different to when applying it to Managed Instances. If you want a rock-solid DR plan, this is the way forward.
Remember it’s an abstraction layer on top of the active geo-replication feature, before this we had to do a lot of manual one to one database setups but now this feature simplifies deployment and management of geo-replicated databases at scale. You can initiate failover manually or automatically if there is a massive failure (researching this topic this could mean things from memory leaks to wrong network cables cut during routine hardware decommissioning – you never know, it could happen so plan for it)

Click through to see how to set this up and what failover looks like.

Comments closed

Full Backups and Log Shipping

Published 2021-10-19 by Kevin Feasel

Kevin Hill answers ja question about whether full backups break log shipping:

Keep on doing your full backups.
Make sure that any databases you Log Ship are NOT also doing log backups in your SQL Maintenance Plans, Ola Jobs, etc.

Click through for a video walkthrough on the topic.

Comments closed

Tips for Azure Site Recovery

Published 2021-08-13 by Kevin Feasel

Joey D’Antoni shares a few experiences when using Azure Site Recovery:

I need to blog more. Stupid being busy. Anyway, last week, we were doing a small scale test for a customer, and it didn’t work the way we were expecting, and for one of the dumbest reasons I’ve ever seen. If you aren’t familiar with Azure Site Recovery it provides disk level replication for VMs, and allows you to bring on-premises VMs online in Azure, or in another Azure region, if you VMs are in Azure already. It’s not an ideal solution for busy SQL Server VMs with extremely low recovery point objectives, however, if you need a simple DR solution for a group of VMs, and can sustain around 30 minutes of data loss, it is cheap and easy. The other benefit that ASR provides, similar to VMware’s Site Recovery Manager, is the ability to do a test recovery in a bubble environment.

Read on for notes from Joey.

Comments closed

NameNode and Secondary NameNode in Hadoop

Published 2021-06-01 by Kevin Feasel

The Hadoop in Real World team hit on a naming scheme that I think is bad:

NameNode is the heart of HDFS. NameNode maintains the metadata of HDFS – files, list of blocks, directories, permissions etc. The metadata is persisted on a file named FSIMAGE. During the start up of NameNode, the FSIMAGE file will be read and loaded into memory.
Any ongoing changes to the files, directories in FSIMAGE will be written to memory and to a temporary log file. NameNode does not save the ongoing changes to FSIMAGE directly and this is because FSIMAGE file could be big for a big HDFS and updating a big file at runtime will be quite expensive and slow.

Read on to learn what the secondary NameNode does. As a hint, it’s not a secondary NameNode in the sense of high availability. If you’re a new Hadoop administrator, the name can be deceiving, letting you think you have high availability when you really don’t.

Comments closed

Handling Disaster Recovery

Published 2021-05-27 by Kevin Feasel

Randolph West has a disaster recovery plan:

I’ve had several occasions where hard drives have failed and attempts to recover data from these wonders of mechanical engineering have been mostly fruitless. I’ve experienced profound examples of data loss, in both cases losing years of email and contact details for people I met online.
This is all to say that I care deeply about data loss, and I take it personally when I’m asked to engage with potential customers to recover data in SQL Server.
This post is a high-level overview of how I tackle data recovery, whether personally or for professional consulting reasons.

Click through for the steps.

Comments closed

High Availability Options for DBAs

Published 2021-05-20 by Kevin Feasel

Pamela Mooney has a list:

In previous articles in this series, I have stated that the job of the DBA is to make the right data available to the right people as quickly as possible.
Here is where we delve more into the word “available” and take it up a notch. SQL Server offers several options for high availability, and understanding the advantages and caveats of each one will give you the best chance of ensuring the availability of data in any scenario. Let’s discuss the options for high availability in general terms and find out where to go to get more information as you need it.
Due to the breadth of this article and keeping with the idea of just learning the basics, I am not going to cover Azure here except to say that Azure either has compatibility with these features in most of its offerings uses them in background processes.

Read on for the list.

Comments closed

Mixed MultiSubnetFailover Support on AGs

Published 2020-10-07 by Kevin Feasel

Andy Mallon continues a line of thought:

In yesterday’s post, I showed how to configure an availability group (AG) to use the RegisterAllProvidersIP=0 when you can’t get clients to connect using the MultiSubnetFailover=true connection string attribute.
I mentioned that you have to make some trade-offs when you set RegisterAllProvidersIP=0, and included this comparison:
But….when if you can eat your cake and have it, too?
In some cases, you’ll have some applications & clients that are not able to use MultiSubnetFailover=true, and other clients that can. Perhaps you’re working on updating a bunch of legacy Java apps to move from old jTDS drivers to the current Microsoft JDBC drivers that properly support MultiSubnetFailover=true. Parts of your codebase have been updated, and you want them to make use of the connection string attribute for fast cross-subnet failover. But other parts of your codebase are still being updated and rely on the RegisterAllProvidersIP cluster parameter to be false. Wouldn’t it be nice to have both?

Read on to learn how.

Comments closed