Clouds as Single Points of Failure

Denny Cherry argues that you should not consider a cloud provider as a single point of failure:

Having a two cloud providers isn’t going to save you from an outage. The public cloud providers (Microsoft Azure, Amazon AWS, Google GCP, etc.) have specifically designed their networks so that an outage at one region doesn’t impact other regions.

The day before US Thanksgiving (November 25, 2020), AWS had a major outage where the east-us facility suffered an outage for several hours. But you’ll notice something very interesting about this outage. No other AWS region was impacted by this outage. This is a very important distinction, as it shows that having multiple regions within AWS would give a solid Disaster Recovery strategy a great fail-over experience.

I’m mostly in agreement with Denny on this, but then I’d also have to point out the Azure AD issue which crippled Azure work across the globe, or the Azure DevOps service going down for a period of time (because everything was hosted in one data center and there was an issue). Depending on just how important uptime is, it can still make sense to be multi-cloud, especially if we use a broad enough definition which includes on-premises as a “local cloud.” In extreme cases—say, you lose millions of dollars per hour of downtime—the cost of a belt + suspenders approach is well below the expected loss from an outage.