Press "Enter" to skip to content

Category: HA / DR

HA/DR With Azure SQL Database

James Serra explains the High Availability and Disaster Recovery options available when using Azure SQL Database:

High Availability (HA) – Keeping your database up 100% of the time with no data loss during common problems.  Redundancy at system level, focus on failover, addresses single predictable failure, focus is on technology.  SQL Server IaaS would handle this with:

  • Always On Failover cluster instances
  • Always On Availability Groups (in same Azure region)
  • SQL Server data files in Azure

Disaster Recovery (DR) – Protection if major disaster or unusual failure wipes out your database.  Use of alternate site, focus on re-establishing services, addresses multiple failures, includes people and processes to execute recovery.  Usually includes HA also.  SQL Server IaaS would handle this with:

  • Log Shipping

  • Database Mirroring

  • Always On Availability Groups (different Azure regions)

  • Backup to Azure

Click through for more details.

Comments closed

Backing Up Cloudera Search Data

Eva Nahari explains different techniques to back up Cloudera Search data, as well as setting up disaster recovery:

If you have the raw data in HDFS (which most do, and which you should!), the most straightforward way to have a hot-warm disaster recovery setup is to use our Backup and Disaster Recovery tool. It allows you to set up regular incremental updates between two clusters. You then have the option of using MapReduce Indexer or Spark Indexer to regularly index the raw data in your recovery cluster and append to a running Solr service in that same recovery cluster. This way you can easily switch over from one Solr service to the backup Solr service if you experience downtime in the original cluster.

The lag would be depending on the network between the clusters and how frequent you transfer data between the clusters. To some extent it would also depend on how long time you need (i.e. how much resources you have available) to complete the MapReduce or Spark indexing workload and append it (using the Cloudera Search GoLive feature) into Solr active indexes on the recovery site.

Read on for several options.

Comments closed

Log Shipping With dbatools

Sander Stad has started a series on using dbatools to help set up log shipping.  Part one walks through the basics and setup:

Technically you don’t need multiple servers to setup log shipping. You can set it up with just one single SQL Server instance. In an HA solution this wouldn’t make sense but technically it’s possible.

Having a separate server acting as the monitoring server ensures that when one of the server goes down, the logging of the actions still takes place.

Having a separate network share for both the backup and copy makes it easier to setup security and decide which accounts can access the backups. The backup share needs to be readable and writable by the primary instance and readable by the secondary instance.
The copy share needs to be accessible and writable for only the secondary instance.

Part two is all about checking the status of a log shipping implementation:

Monitoring your log shipping processes is important. You need the synchronization status of the log shipped databases.

The log ship process consists of three steps; Backup, Copy and Restore. The log shipping tracks the status for these processes.
It registers the last transaction log backup, the last file copied and the last file restored. It also keeps track of the time since the last backup, copy and restore.

But that’s not all. Log shipping also checks if the threshold for the backup and restore has been exceeded.

Log shipping is an underrated piece of the HA/DR puzzle, and Sander shows how easy dbatools makes it to configure.

Comments closed

So You Want To Monitor Kafka…

Ben Summer reviews a Kafka catstrophe and explains how to avoid it:

Here at New Relic, the Edge team is responsible for the pipelines that handle all the data coming into our company. We were an early adopter of Apache Kafka, which we began using to power this data pipeline. Our initial results were outstanding. Our cluster handled any amount of data we threw at it; it showed incredible fault tolerance and scaled horizontally. Our implementation was so stable for so long that we basically forgot about it. Which is to say, we totally neglected it. And then one day we experienced a catastrophic incident.

Our main cluster seized up. All graphs, charts, and dashboards went blank. Suddenly we were totally in the dark — and so were our customers. The incident lasted almost four hours, and in the end, an unsatisfactory number of customers experienced some kind of data loss. It was an epic disaster. Our Kafka infrastructure had been running like a champ for more than a year and suddenly it had ground to a halt.

This happened several years ago, but to this day we still refer to the incident as the “Kafkapocalypse.”

Ben has a couple interesting stories and some good rules of thumb for maintaining a Kafka cluster.

Comments closed

Building A Disaster Recovery Checklist

Robert Davis has a checklist for you:

If a disaster struck tomorrow, are you ready for it? Are you sure you thought of everything? I cannot count the number of times something happened, and I thought that I should have been able to anticipate that. It can be a big benefit to get another viewpoint on your disaster plan. No matter how well you think you’ve covered yourself, someone who thinks about things differently than you do can often see something you don’t.

Here’s a checklist of things you might not have considered. This can be your other set of eyes.

This is not a to-do list or a process guide when disaster strikes; rather, it’s a checklist of things you should have in place before a disaster occurs.

Comments closed

Planning For A Good Disaster

Randolph West loves it when a (disaster recovery) plan comes together:

My first question to the attendee was what the Service Level Agreement (SLA) says. As we know from previous posts, a disaster recovery strategy is dictated by business requirements, not technical ones. The Recovery Point Objective (how much data loss is acceptable) and Recovery Time Objective (how much time there is to bring everything back) will guide my proposal.

He told me that the SLA was 24 hours, so I started writing on the white board while I was thinking aloud.

On average, a fast storage layer can read and write around 200 MB/s, so it would take 5.12 seconds to write 1 GB, or just under 85 minutes to restore the database back to disk, not counting the log file or crash recovery. I never assume that Instant File Initialization is enabled, plus I won’t know how big the transaction log file is going to be, and that needs to be zeroed out.

I like this post a lot because it lets us get a glimpse into Randolph’s thought process and gives some hard numbers that you should have in mind.

Comments closed

Backup And Recovery With Hadoop

Tim Spann explains how to perform backup/recovery operations and disaster recovery using Hadoop:

You can mirror datasets with Falcon. Mirroring is a very useful option for enterprises and is well-documented. This is something that you may want to get validated by a third party. See the following resources:

Tim shows several recovery options, making it useful reading if you use Hadoop as a source system for anything (or if you can’t afford it to be down for a 2-3 day period as you recover data).

Comments closed

Log Shipping On Linux

Andrew Pruski digs into how to set up log shipping for SQL Server on Linux:

What I’m going to do is setup two instances of SQL Server running on linux and log ship one database from one to another. So the first thing I did was get two VMs running Ubuntu 16.04.1 LTS which can be download from here.

Once both servers were setup (remember to enable ssh) I then went about getting SQL setup, I’m not going to go through the install in this post as the process is documented fully here. Don’t forget to also install the SQL Tools, full guide is here.

Read on for the guide, but also be sure to read his disclaimer.

Comments closed

Asynchronous Mirroring And Asynchronous AGs

Tara Kizer contrasts asynchronous database mirroring with asynchronous Availability Groups:

ASYNCHRONOUS DATABASE MIRRORING

For asynchronous Database Mirroring, all we need is two servers: the principal at the primary site and the secondary at the DR site. Setup async mirroring between the two, and you’re done. If the secondary server goes down, production still continues. The transaction log isn’t clearing when the log backups occur because the principal still needs to send those log records to the secondary. As long as you have enough disk space where the transaction log resides to support this until the secondary server comes back online, production still continues. Of course if you run out of disk space, users will start receiving errors. But that can take some time and often times is enough time to get the secondary back online.

Availability Groups are more difficult to implement than database mirroring, but the flipside is that they’re much more powerful.

Comments closed

Running CheckDB On Log Shipping Subscribers

Brent Ozar shows how to support running DBCC CHECKDB on a log shipping subscriber:

 

At first glance, if you set up log shipping the way most folks do, it looks like you can’t run CHECKDB on a subscriber:

On SQL Server Enterprise Edition, you can try working around it by creating a database snapshot – but that won’t work either:

Here’s the trick: your database needs to be in standby mode instead of norecovery. When you run restores with the standby option, you’re able to query the database in between restores.

This doesn’t obviate the need for running CHECKDB on a primary, but it can offload some of that work some of the time.

Comments closed