HA / DR – Page 4 – Curated SQL

Portworx and Kubernetes Storage Failover

Published 2022-10-13 by Kevin Feasel

In a nutshell, the issue is that the attachdetach-controller in Kubernetes won’t detach storage from an offline node until that node is either brought back online or is removed from the cluster. What this means is that a pod spinning up on a new node that requires that storage can’t come online.
Aka, if you’re running SQL Server in Kubernetes and a node fails, SQL won’t be able to come back online until someone manually brings the node online or deletes the node.
Not great tbh, and it’s been a blocker for my PoC testing.
However, there are ways around this…one of them is by a product called Portworx which I’m going to demo here.

After a short disclaimer, there’s plenty of good content.

Comments closed

Attaching All SQL Server Data Files in a Single Directory

Published 2022-08-16 by Kevin Feasel

David Fowler migrated a bunch of databases:

Have you ever had the need to attach a large number of database in one go? There’s no way to attach multiple databases in SSMS or via script, so you’re probably going to be left with the slow, arduous task of doing them one by one.
I recently had to deal with a DR situation (I won’t go into details of what happened just yet as things are still quite sensitive, but I might look at it at some point in the future) where I faced exactly that issue. For one reason or another I needed to attach several hundred databases quickly. I didn’t fancy doing that via SSMS or script each one individually so I knocked together this script to do the job for me.

Click through for that script and instructions. Alternatively, a bit of Powershell and the right dbatools command could get you to the same result but this is good in the event that you can’t leave SSMS.

Comments closed

High Availability in SQL Managed Instance General Purpose Tier

Published 2022-05-02 by Kevin Feasel

Niko Neugebauer clears up what options you have for high availability in SQL MI’s General Purpose tier:

The two main requirements around high availability are commonly known as RTO and RPO.

RTO – stands for Recovery Time Objective and is the maximum allowable downtime when a failure occurs. In other words, how much time it takes for your databases to be up and running.

RPO – stands for Recovery Point Objective and is the maximum allowable data-loss when a failure occurs. Of course, the ideal scenario is not to lose any data, but a more realistic (and also ideal) scenario is to not lose any committed data, also known as Zero Committed Data Loss.

With those definitions out of the way, read on to learn more.

Comments closed

Database Mirroring Compatibility and Availability Groups

Published 2022-01-24 by Kevin Feasel

Sean Gallardy checks out the past:

Around 2005, mirroring was born. It was an evolution on log shipping, which is taking log backups, moving them around, and restoring them all in an automated fashion to different servers. Mirroring upped that game and created a dedicated network channel between servers (you could only have 1 principle and 1 mirror, so 2 total) so that there wasn’t this funny business of copying and restoring, additionally it allowed the mirror server to be a highly available copy with automatic failover. Since Microsoft marketing is terrible at naming things, it was originally called, “Real Time Log Shipping” which was then changed to “Mirroring” and in typical fashion you can find the unofficial “Real Time Log Shipping” name all over the place where it was never updated. (I can’t really blame them here, though, it’s hard to find all the little places you’re putting this moniker in and then having some other team tell you to change it all at some way later point)

Read the whole thing. It’s a fun read, a little sad, and helps us understand a bit of availability group behavior which might bite the unaware. I will definitely defend Microsoft’s backward-compatibility emphasis. This makes life so much easier for developers than a lot of other languages and environments. In the R and Python worlds, breaking changes are the norm, meaning that when you update packages, you can expect something to break and now that “20-minute” package upgrade ticket becomes 3 days of trying to sort out what went wrong.

Comments closed

Auto-Failover Groups for Azure SQL Hyperscale

Published 2022-01-18 by Kevin Feasel

Melody Zacharias fills us in on a recent announcement:

On January 5th they announced, auto-failover groups for Azure SQL Hyperscale are now available in preview. Auto-failover groups is a feature that allows you to manage the failover and replication of a group of databases on a server or managed instance from one region to another region in Azure. This can be done manually or in conjunction with a user-defined policy.

Click through for more information on how it all works.

Comments closed

Managed Instance Failover Groups

Published 2021-11-23 by Kevin Feasel

Arun Sirpal takes us through Azure SQL Managed Instance failover groups:

If you have been following me for a while you will know that I really like the Fail over groups within Azure SQL DB and it is no different to when applying it to Managed Instances. If you want a rock-solid DR plan, this is the way forward.
Remember it’s an abstraction layer on top of the active geo-replication feature, before this we had to do a lot of manual one to one database setups but now this feature simplifies deployment and management of geo-replicated databases at scale. You can initiate failover manually or automatically if there is a massive failure (researching this topic this could mean things from memory leaks to wrong network cables cut during routine hardware decommissioning – you never know, it could happen so plan for it)

Click through to see how to set this up and what failover looks like.

Comments closed

Full Backups and Log Shipping

Published 2021-10-19 by Kevin Feasel

Kevin Hill answers ja question about whether full backups break log shipping:

Keep on doing your full backups.
Make sure that any databases you Log Ship are NOT also doing log backups in your SQL Maintenance Plans, Ola Jobs, etc.

Click through for a video walkthrough on the topic.

Comments closed

Tips for Azure Site Recovery

Published 2021-08-13 by Kevin Feasel

Joey D’Antoni shares a few experiences when using Azure Site Recovery:

I need to blog more. Stupid being busy. Anyway, last week, we were doing a small scale test for a customer, and it didn’t work the way we were expecting, and for one of the dumbest reasons I’ve ever seen. If you aren’t familiar with Azure Site Recovery it provides disk level replication for VMs, and allows you to bring on-premises VMs online in Azure, or in another Azure region, if you VMs are in Azure already. It’s not an ideal solution for busy SQL Server VMs with extremely low recovery point objectives, however, if you need a simple DR solution for a group of VMs, and can sustain around 30 minutes of data loss, it is cheap and easy. The other benefit that ASR provides, similar to VMware’s Site Recovery Manager, is the ability to do a test recovery in a bubble environment.

Read on for notes from Joey.

Comments closed

NameNode and Secondary NameNode in Hadoop

Published 2021-06-01 by Kevin Feasel

The Hadoop in Real World team hit on a naming scheme that I think is bad:

NameNode is the heart of HDFS. NameNode maintains the metadata of HDFS – files, list of blocks, directories, permissions etc. The metadata is persisted on a file named FSIMAGE. During the start up of NameNode, the FSIMAGE file will be read and loaded into memory.
Any ongoing changes to the files, directories in FSIMAGE will be written to memory and to a temporary log file. NameNode does not save the ongoing changes to FSIMAGE directly and this is because FSIMAGE file could be big for a big HDFS and updating a big file at runtime will be quite expensive and slow.

Read on to learn what the secondary NameNode does. As a hint, it’s not a secondary NameNode in the sense of high availability. If you’re a new Hadoop administrator, the name can be deceiving, letting you think you have high availability when you really don’t.

Comments closed

Handling Disaster Recovery

Published 2021-05-27 by Kevin Feasel

Randolph West has a disaster recovery plan:

I’ve had several occasions where hard drives have failed and attempts to recover data from these wonders of mechanical engineering have been mostly fruitless. I’ve experienced profound examples of data loss, in both cases losing years of email and contact details for people I met online.
This is all to say that I care deeply about data loss, and I take it personally when I’m asked to engage with potential customers to recover data in SQL Server.
This post is a high-level overview of how I tackle data recovery, whether personally or for professional consulting reasons.

Click through for the steps.

Comments closed

Category: HA / DR