Yes, and that's exactly the problem. It's like choosing a microservice architecture for resiliency and building all the services on top of the same database or message queue without underlying redundancy.
afaik they have a tiered service architecture, where tier 1 services are allowed to rely on tier 0 services but not vice-versa, and have a bunch of reliability guarantees on tier 0 services that are higher than tier 1.
It is kinda cool that the worst aws outages are still within a single region and not global.
But I think what wasn't well considered was the async effect - If something is gone for 5 minutes, maybe it will be just fine, but when things are properly asynchronous, then the workflows that have piled up during that time becomes a problem in itself. Worst case, they turn into poison pills which then break the system again.