To my understanding the main problem is DynamoDB being down, and DynamoDB is what a lot of AWS services use for their eventing systems behind the scenes. So there's probably like 500 billion unprocessed events that'll need to get processed even when they get everything back online. It's gonna be a long one.
I wonder how many companies have properly designed their clients. So that the timing before re-attempt is randomised and the re-attempt timing cycle is logarithmic.
In short, if it’s all at the same schedule you’ll end up with surges of requests followed by lulls. You want that evened out to reduce stress on the server end.
It's just a safe pattern that's easy to implement. If your services back-off attempts happen to be synced, for whatever reason, even if they are backing off and not slamming AWS with retries, when it comes online they might slam your backend.
It's also polite to external services but at the scale of something like AWS that's not a concern for most.