Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They should continue to retry but with exponential backoff and jitter. Not in a busy loop!


If the reliability of your system depends upon the competence of your customers then it isn't very reliable.


Have you ever built a service designed to operate at planetary scale? One that's built of hundreds or thousands of smaller service components?

There's no such thing as infinite scalability. Even the most elastic services are not infinitely elastic. When resources are short, you either have to rely on your customers to retry nicely, or you have to shed load during overload scenarios to protect goodput (which will deny service to some). For a high demand service, overload is most likely during the first few hours after recovery.

See e.g., https://d1.awsstatic.com/builderslibrary/pdfs/Resilience-les...


Probably stupid question (I am not a network/infra engineer) - can you not simply rate limit requests (by IP or some other method)?

Yes your customers may well implement stupidly aggressive retries, but that shouldn't break your stuff, they should just start getting 429s?


Load shedding effectively does that. 503 is the correct error code here to indicate temporary failure; 429 means you've exhausted a quota.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: