To my understanding the main problem is DynamoDB being down, and DynamoDB is wha...

jewba · 2025-10-20T09:23:53 1760952233

500 billions events. Always blows my mind how many people use aws

Implicated · 2025-10-20T09:32:51 1760952771

I know nothing. But I'd imagine the number of 'events' generated during this period of downtime will eclipse that number every minute.

zimpenfish · 2025-10-20T09:37:52 1760953072

"I felt a great disturbance in us-east-1, as if millions of outage events suddenly cried out in terror and were suddenly silenced"

(Be interesting to see how many events currently going to DynamoDB are actually outage information.)

nicce · 2025-10-20T10:18:14 1760955494

I wonder how many companies have properly designed their clients. So that the timing before re-attempt is randomised and the re-attempt timing cycle is logarithmic.

8note · 2025-10-20T18:43:21 1760985801

nowadays i think a single immediate retry is preferred over exponential backoff with jitter.

if you ran into a problem that an instant retry cant fix, chances are you will be waiting so long that your own customer doesnt care anymore.

mdavidn · 2025-10-21T02:36:23 1761014183

Most companies will use the AWS SDK client's default retry policy.

lan321 · 2025-10-20T10:39:21 1760956761

Why randomized?

yardstick · 2025-10-20T11:05:25 1760958325

It’s the Thundering Herd Problem.

See https://en.wikipedia.org/wiki/Thundering_herd_problem

In short, if it’s all at the same schedule you’ll end up with surges of requests followed by lulls. You want that evened out to reduce stress on the server end.

lan321 · 2025-10-20T11:58:01 1760961481

Thank you. Bonsai and adzm as well. :)

BonsaiAU · 2025-10-20T11:06:14 1760958374

It's just a safe pattern that's easy to implement. If your services back-off attempts happen to be synced, for whatever reason, even if they are backing off and not slamming AWS with retries, when it comes online they might slam your backend.

It's also polite to external services but at the scale of something like AWS that's not a concern for most.

jeffhuys · 2025-10-21T05:33:57 1761024837

> they might slam your backend

Heh

adzm · 2025-10-20T10:50:36 1760957436

Helps distribute retries rather than having millions synchronize