Easiest day for engineers on-call everywhere except AWS staff. There’s nothing y...

darkwater · 2025-10-20T08:58:40 1760950720

Well, but tomorrow there will be CTOs asking for a contingency plan if AWS goes down, even if planning, preparing, executing and keeping it up to date as the infra evolves will cost more than the X hours of AWS outage.

There are certainly organizations for which that cost is lower than the overall damage of services being down due to AWS fault, but tomorrow we will hear CTOs from smaller orgs as well.

noir_lord · 2025-10-20T09:33:29 1760952809

They’ll ask, in a week they’ll have other priorities and in a month they’ll have forgotten about it.

This will hold until the next time AWS had a major outage, rinse and repeat.

darkwater · 2025-10-20T09:52:00 1760953920

It's so true it hurts. If you are new in any infra/platform management position you will be scared as hell this week. Then you will just learn that feeling will just disappear by itself in a few days.

noir_lord · 2025-10-20T12:17:51 1760962671

Yep, when I was a young programmer I lived in dread of an outage or worse been responsible for a serious bug in production, then I got to watch what happened when it happened to others (and that time I dropped the prod database at half past four on a Friday).

When everything is some varying degree of broken at all times been responsible for a brief uptick in the background brokenness isn't the drama you think it is.

It would be different if the systems I worked on where true life and death (ATC/Emergency Services etc) but in reality the blast radius from my fucking up somewhere is monetary and even at the biggest company I worked for constrained (while 100+K per hour from an outage sounds horrific - in reality the vast majority of that was made up when the service was back online, people still needed to order the thing in the end).

swat535 · 2025-10-20T18:19:55 1760984395

This applies to literally half of random "feature requests" and "tasks" that are urgent and needed to get done yesterday incoming from the business team..

brazukadev · 2025-10-20T09:15:48 1760951748

Lots of NextJS CTOs are gonna need to think about it for the first time too

mrits · 2025-10-20T11:30:21 1760959821

He will then give it to the CEO who says there is no budget for that

ranger207 · 2025-10-20T13:45:02 1760967902

Honestly? "Nothing because all our vendors are on us-east-1 too"

mvdtnz · 2025-10-20T09:53:12 1760953992

No really true for large systems. We are doing things like deploying mitigations to avoid scale-in (eg services not receiving traffic incorrectly autoscaling down), preparing services for the inevitable storm, managing various circuit breakers, changing service configurations to ease the flow of traffic through the system, etc. We currently have 64 engineers in our on-call room managing this. There's plenty of work to do.

fisf · 2025-10-20T12:27:36 1760963256

Well, some engineer somewhere made the recommendation to go with AWS, even tho it is more expensive than alternatives. That should raise some questions.

array_key_first · 2025-10-20T12:39:38 1760963978

Engineer maybe, executive swindled by sales team? Definitely.

devjam · 2025-10-20T22:16:09 1760998569

> Easiest day for engineers on-call everywhere

I have three words for you: cascading systems failure

aswegs8 · 2025-10-20T09:25:59 1760952359

Can confirm, pretty chill we can blame our current issues on AWS.

codeduck · 2025-10-20T08:53:35 1760950415

and by one I trust you mean a bottle.