The Internet is a Single Point of Failure

Published 11/21/2025

Written by Rich Mogull, Chief Analyst, CSA.

Resiliency through multicloud looks great on paper, but the reality is far more complex (and expensive).

Thanks to Amazon, Microsoft, and Google, my calendar over the past few weeks spiked with members calling to discuss cloud resiliency. Each of these outages was rare, and none of them shared any relationship or commonality, but we humans have this pesky habit of getting worried when there’s an uptick in similar-sounding incidents. (It’s probably tied to a deep survival instinct to recognize that when three of our neighbors get eaten by bears, maybe it’s time to try to not get eaten by a bear.)

Every one of these calls unfortunately invoked the word “multicloud”.

Why did I say, “unfortunately”? Because multicloud should be your absolute last resiliency option, and for most organizations it won’t help anyone except the extra cloud provider(s) you’re now feeding cash. It’s technically complex, expensive, and might break anyway since the Internet itself isn’t totally reliable.

The Reality of Multicloud Resiliency

Let’s get it out of the way first — yes, you can build truly multicloud applications. Yes, it may improve your resiliency. But it will come with a high price and additional opportunities for failure that often exceed any uptime requirements you might have.

For some enterprises and some of their applications, it’s worth considering. But multicloud resiliency should be the last option after you’ve established bombproof single cloud resiliency.

Here’s why multicloud resiliency is so difficult and expensive:

The foundational technologies of each provider are fundamentally different down to the lowest levels.
Multiple versions of the application architecture will need to be maintained and kept in synchronization.
Containers are not truly cloud agnostic since the management plane and any PaaS usage are cloud specific.
Few organizations have operational maturity on multiple cloud providers.
Few organizations have security maturity on multiple cloud providers.
Few organizations are properly skilled and staffed for multiple cloud providers.
Moving data between providers is expensive. Synchronizing live data without loss is even more expensive.
You will still have Internet-level failure points like DNS, CDN (as we just saw), network exchanges (when using private networks), and cross-provider Internet connectivity (when using only the Internet).

That last one is an important point: the Internet itself is a single point of failure! Your DNS can fail. Your CDN can fail. Your DNS can still work but cached traffic might still route to the down provider. Or maybe your backhaul network will go down, or there will be a routing or BGP issue that results in data that is out of sync just enough to fail a transaction in an… unfortunate way.

Moving to multicloud increases cost, complexity, and the potential points of failure.

Now, can you design for, test, and achieve greater resilience than in a single provider? Absolutely! If you have enough money. If you are willing to double your storage costs, increase your networking costs, and pay for the people with the right skills to build and maintain on multiple providers.

And some applications are legitimately more aligned to a multicloud deployment. Ones where data synchronization needs are low and application functionality is more self-contained and relies less on the capabilities of the PaaS capabilities of the cloud provider.

Designing Cloud Resilience

Resiliency needs to be realistic, pragmatic, tested, and aligned with actual business requirements. I can’t tell you how many times I hear people tossing out requirements for 4-5 9’s of availability without any reasonable justification. It’s also important to know the uptime and capabilities of your provider. There are clear differences even among the hyperscalers, especially when you start looking into regional differences.

We cover resiliency in chapter 11.6 of the CSA Security Guidance. It starts with a basic hierarchy:

Start by building single-region resiliency. Pro-tip: research and pick a more-resilient region.
Then consider moving into multi-region resiliency within a single cloud provider. Pro-tip: be very familiar with your provider’s global services dependencies and how they affect regional availability.
Only consider multicloud resiliency for the most critical of applications, and make sure you account for those external dependencies (DNS/CDN/Network/etc.).

The reality is that multi-region is the sweet spot that balances availability that most traditional datacenters can’t dream of, for a reasonable cost increase.

In the Guidance we also cover 4 major resiliency tools: architecture (e.g. autoscaling/serverless), IaC, automation, and… chaos engineering.

I love chaos engineering. I love the idea of automatically breaking things and injecting faults into production to ensure that everything deployed is prepared for things to fail. In security we like to talk about the “assume breach” mentality, and with chaos engineering we “assume failure”.

Which brings me to my rule of thumb when advising on resiliency:

Don’t even think about multicloud resiliency until you’ve built for high-availability in a single cloud, and implemented chaos engineering.

Otherwise you’re just doubling your complexity and costs without knowing if it will even work.

Strategic Operational Data Security Risk Management