Yes, and that's exactly the problem. It's like choosing a microservice architect...

pm90 · 2025-10-20T11:20:40 1760959240

afaik they have a tiered service architecture, where tier 1 services are allowed to rely on tier 0 services but not vice-versa, and have a bunch of reliability guarantees on tier 0 services that are higher than tier 1.

It is kinda cool that the worst aws outages are still within a single region and not global.

UltraSane · 2025-10-20T14:30:51 1760970651

There IS a huge amount of redundancy built into the core services but nothing is perfect.

Aperocky · 2025-10-21T12:16:21 1761048981

DNS is always the single point of failure.

But I think what wasn't well considered was the async effect - If something is gone for 5 minutes, maybe it will be just fine, but when things are properly asynchronous, then the workflows that have piled up during that time becomes a problem in itself. Worst case, they turn into poison pills which then break the system again.