Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, and that's exactly the problem. It's like choosing a microservice architecture for resiliency and building all the services on top of the same database or message queue without underlying redundancy.


afaik they have a tiered service architecture, where tier 1 services are allowed to rely on tier 0 services but not vice-versa, and have a bunch of reliability guarantees on tier 0 services that are higher than tier 1.

It is kinda cool that the worst aws outages are still within a single region and not global.


There IS a huge amount of redundancy built into the core services but nothing is perfect.


DNS is always the single point of failure.

But I think what wasn't well considered was the async effect - If something is gone for 5 minutes, maybe it will be just fine, but when things are properly asynchronous, then the workflows that have piled up during that time becomes a problem in itself. Worst case, they turn into poison pills which then break the system again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: