Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Easiest day for engineers on-call everywhere except AWS staff. There’s nothing you can do except wait for AWS to come back online.

Pour one out for the customer service teams of affected businesses instead



Well, but tomorrow there will be CTOs asking for a contingency plan if AWS goes down, even if planning, preparing, executing and keeping it up to date as the infra evolves will cost more than the X hours of AWS outage.

There are certainly organizations for which that cost is lower than the overall damage of services being down due to AWS fault, but tomorrow we will hear CTOs from smaller orgs as well.


They’ll ask, in a week they’ll have other priorities and in a month they’ll have forgotten about it.

This will hold until the next time AWS had a major outage, rinse and repeat.


It's so true it hurts. If you are new in any infra/platform management position you will be scared as hell this week. Then you will just learn that feeling will just disappear by itself in a few days.


Yep, when I was a young programmer I lived in dread of an outage or worse been responsible for a serious bug in production, then I got to watch what happened when it happened to others (and that time I dropped the prod database at half past four on a Friday).

When everything is some varying degree of broken at all times been responsible for a brief uptick in the background brokenness isn't the drama you think it is.

It would be different if the systems I worked on where true life and death (ATC/Emergency Services etc) but in reality the blast radius from my fucking up somewhere is monetary and even at the biggest company I worked for constrained (while 100+K per hour from an outage sounds horrific - in reality the vast majority of that was made up when the service was back online, people still needed to order the thing in the end).


This applies to literally half of random "feature requests" and "tasks" that are urgent and needed to get done yesterday incoming from the business team..


Lots of NextJS CTOs are gonna need to think about it for the first time too


He will then give it to the CEO who says there is no budget for that


Honestly? "Nothing because all our vendors are on us-east-1 too"


No really true for large systems. We are doing things like deploying mitigations to avoid scale-in (eg services not receiving traffic incorrectly autoscaling down), preparing services for the inevitable storm, managing various circuit breakers, changing service configurations to ease the flow of traffic through the system, etc. We currently have 64 engineers in our on-call room managing this. There's plenty of work to do.


Well, some engineer somewhere made the recommendation to go with AWS, even tho it is more expensive than alternatives. That should raise some questions.


Engineer maybe, executive swindled by sales team? Definitely.


> Easiest day for engineers on-call everywhere

I have three words for you: cascading systems failure


Can confirm, pretty chill we can blame our current issues on AWS.


and by one I trust you mean a bottle.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: