Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Internet, out.

Very big day for an engineering team indeed. Can't vibe code your way out of this issue...



Easiest day for engineers on-call everywhere except AWS staff. There’s nothing you can do except wait for AWS to come back online.

Pour one out for the customer service teams of affected businesses instead


Well, but tomorrow there will be CTOs asking for a contingency plan if AWS goes down, even if planning, preparing, executing and keeping it up to date as the infra evolves will cost more than the X hours of AWS outage.

There are certainly organizations for which that cost is lower than the overall damage of services being down due to AWS fault, but tomorrow we will hear CTOs from smaller orgs as well.


They’ll ask, in a week they’ll have other priorities and in a month they’ll have forgotten about it.

This will hold until the next time AWS had a major outage, rinse and repeat.


It's so true it hurts. If you are new in any infra/platform management position you will be scared as hell this week. Then you will just learn that feeling will just disappear by itself in a few days.


Yep, when I was a young programmer I lived in dread of an outage or worse been responsible for a serious bug in production, then I got to watch what happened when it happened to others (and that time I dropped the prod database at half past four on a Friday).

When everything is some varying degree of broken at all times been responsible for a brief uptick in the background brokenness isn't the drama you think it is.

It would be different if the systems I worked on where true life and death (ATC/Emergency Services etc) but in reality the blast radius from my fucking up somewhere is monetary and even at the biggest company I worked for constrained (while 100+K per hour from an outage sounds horrific - in reality the vast majority of that was made up when the service was back online, people still needed to order the thing in the end).


This applies to literally half of random "feature requests" and "tasks" that are urgent and needed to get done yesterday incoming from the business team..


Lots of NextJS CTOs are gonna need to think about it for the first time too


He will then give it to the CEO who says there is no budget for that


Honestly? "Nothing because all our vendors are on us-east-1 too"


No really true for large systems. We are doing things like deploying mitigations to avoid scale-in (eg services not receiving traffic incorrectly autoscaling down), preparing services for the inevitable storm, managing various circuit breakers, changing service configurations to ease the flow of traffic through the system, etc. We currently have 64 engineers in our on-call room managing this. There's plenty of work to do.


Well, some engineer somewhere made the recommendation to go with AWS, even tho it is more expensive than alternatives. That should raise some questions.


Engineer maybe, executive swindled by sales team? Definitely.


> Easiest day for engineers on-call everywhere

I have three words for you: cascading systems failure


Can confirm, pretty chill we can blame our current issues on AWS.


and by one I trust you mean a bottle.


>Can't vibe code your way out of this issue...

I feel bad for the people impacted by the outage. But at the same time there's a part of me that says we need a cataclysmic event to shake the C-Suite out of their current mindset of laying off all of their workers to replace them with AI, the cheapest people they can find in India, or in some cases with nothing at all, in order to maximize current quarter EPS.


/ai why is AWS down? can you bring it back up


Pour one out for everyone on-call right now.


After some thankless years preventing outages for a big tech company, I will never take an oncall position again in my life.

Most miserable working years I have had. It's wild how normalized working on weekends and evenings becomes in teams with oncall.

But it's not normal. Our users not being able to shitpost is simply not worth my weekend or evening.

And outside of Google you don't even get paid for oncall at most big tech companies! Company losing millions of dollars an hour, but somehow not willing to pay me a dime to jump in at 3AM? Looks like it's not my problem!


When I used to be on call for Cisco WebEx services. I got paid extra, and got extra time of. Even if nothing happened. In addition we where enough people on the rotation, so I didn't have to do it that often.

I believe the rules varied based on jurisdiction, and I think some had worse deals, and some even better. But I was happy with our setup in Norway.

Tbh I do not think we would have had, what we had if it wasn't for the local laws and regulations. Sometimes worker friendly laws can be nice.


As I was reading the parent, I was thinking “hm, doesn’t match my experience at Cisco!” So it’s funny to see your comment right after.


> And outside of Google you don't even get paid for oncall at most big tech companies.

What the redacted?


Welcome to the typical American salary abuse. There's even a specific legal cutout exempting information technology, scientific and artistic fields from the overtime pay requirements of the Fair Labor Standards Act.

There's a similar cutout for management, which is how companies like GameStop squeeze their retail managers. They just don't give enough payroll hours for regular employees, so the salaried (but poorly paid) manager has to cover all of the gaps.


It's also unneccesary at large companies, since there'll likely be enough offices globally to have a follow the sun model.


Follow the sun does not happen by itself. Very few if any engineering teams are equally split across thirds of the globe in such a way that (say) Asia can cover if both EMEA and the Americas are offline.

Having two sites cover the pager is common, but even then you only have 16 working hours at best and somebody has to take the pager early/late.


Not to mention that the knowledge, skills, experience to troubleshoot/recover is rarely evenly distributed across the teams.


We get TOIL for being on call.


"Your shitposting is very important to us, please stay on the site"


> But this is not normal. Our users not being able to shitpost is simply not worth my weekend or evening.

It is completely normal for staff to have to work 24/7 for critical services.

Plumbing, HVAC, power plant engineers, doctors, nurses, hospital support staff, taxi drivers, system and network engineers - these people keep our modern world alive, all day, every day. Weekends, midnights, holidays, every hour of every day someone is AT WORK to make sure our society functions.

Not only is it normal, it is essential and required.

It’s ok that you don’t like having to work nights or weekends or holidays. But some people absolutely have to. Be thankful there are EMTs and surgeons and power and network engineers working instead of being with their families on holidays or in the wee hours of the night.


Nice try at guilt-tripping people doing on-call, and doing it for free.

But to parent's points: if you call a plumber or HVAC tech at 3am, you'll pay for the privilege.

And doctors and nurses have shifts/rotas. At some tech places, you are expected to do your day job plus on-call. For no overtime pay. "Salaried" in the US or something like that.


And these companies often say "it's baked into your comp!" But you can typically get the same exact comp working an adjacent role with no oncall.


Then do that instead. What’s the problem with simply saying “no”?


Yup, that is precisely what I did and what I'm encouraging others to do as well.

Edit: On-call is not always disclosed. When it is, it's often understated. And finally, you can never predict being re-orged into a team with oncall.

I agree employees should still have the balls to say "no" but to imply there's no wrongdoing here on companies' parts and that it's totally okay for them to take advantage of employees like this is a bit strange.

Especially for employees that don't know to ask this question (new grads) or can't say "no" as easily (new grads or H1Bs.)


You’re looking for a job in this economy with a ‘he said no to being on call’ in your job history.

This is plainly bad regulation, the market at large discovered the marginal price of oncall is zero, but it’s rather obviously skewed in employer’s favor.


Guilt tripping? Quite the opposite.

If you or anyone else are doing on-call for no additional pay, precisely nobody is forcing you to do that. Renegotiate, or switch jobs. It was either disclosed up front or you missed your chance to say “sorry, no” when asked to do additional work without additional pay. This is not a problem with on call but a problem with spineless people-pleasers.

Every business will ask you for a better deal for them. If you say “sure” to everything you’re naturally going to lose out. It’s a mistake to do so, obviously.

An employee’s lack of boundaries is not an employer’s fault.


First, you try to normalise it:

> It is completely normal for staff to have to work 24/7 for critical services.

> Not only is it normal, it is essential and required.

Now you come with the weak "you don't have to take the job" and this gem:

> An employee’s lack of boundaries is not an employer’s fault.

As if there isn't a power imbalance, or employers always disclose everything or chance their mind. But of course, let's blame those entitled employees!


No one dies if our users can't shitpost until tomorrow morning.

I'm glad there are people willing to do oncall. Especially for critical services.

But the software engineering profession as a whole would benefit from negotiating concessions for oncall. We have normalized work interfering with life so the company can squeeze a couple extra millions from ads. And for what?

Nontrivial amount of ad revenue lost? Not my problem if the company can't pay me to mitigate.


> Nontrivial amount of ad revenue lost? Not my problem if the company can't pay me to mitigate.

Interestingly, when I worked on analytics around bugs we found that often (in the ads space), there actually wasn't an impact when advertisers were unable to create ads, as they just created all of them when the interface started working again.

Now, if it had been the ad serving or pacing mechanisms then it would've been a lot of money, but not all outages are created equal.


Not all websites are for shitposting. I can’t talk to my clients for whom I am on call because Signal is down. I also can’t communicate with my immediate family. There are tons of systems positively critical to society downstream from these services.

Some can tolerate downtime. Many can’t.


You could give them a Phone call, you know. Pretty reliable technology.


No, actually, I can’t. My phone doesn’t have a phone number and can’t make calls.


Also: do you think the telephone network has no oncall engineers?


You know, there's this thing called shifts. You should look it up.


I expect it's their SREs who are dealing with this mess.


> Can't vibe code your way out of this issue...

Exactly. This time, some LLM providers are also down and can't help vibe coders on this issue.


Qwen3 on lm-studio running fine on my work Mac M3, what's wrong with yours?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: