Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Choosing us-east-1 as your primary region is good, because when you're down, everybody's down, too. You don't get this luxury with other US regions!


One unexpected upside moving from a DC to AWS is when a region is down, customers are far more understanding. Instead of being upset, they often shrug it off since nothing else they needed/wanted was up either.


This is a remarkable and unfair truth. I have had this experience with Office365...when they're down a lot of customers don't care because all their customers are also down.


It took me so long to realise this is what's important in enterprise. Uptime isn't important, being able to blame someone else is what's important.

If you're down for 5 minutes a year because one of your employees broke something, that's your fault, and the blame passes down through the CTO.

If you're down for 5 hours a year but this affected other companies too, it's not your fault

From AWS to Crowdstrike - system resilience and uptime isn't the goal. Risk mitigation isn't the goal. Affordability isn't the goal.

When the CEO's buddies all suffer at the same time as he does, it's just an "act of god" and nothing can be done, it's such a complex outcome that even the amazing boffins at aws/google/microsoft/cloudflare/etc can't cope.

If the CEO is down at a different time than the CEO's buddies then it's that Dave/Charlie/Bertie/Alice can't cope and it's the CTO's fault for not outsourcing it.

As someone who likes to see things working, it pisses me off no end, but it's the way of the world, and likely has been whenever the owner and CTO are separate.


A slightly less cynical view: execs have a hard filter for “things I can do something about” and “things I can’t influence at all.” The bad ones are constantly pushing problems into the second bucket, but there are legitimately gray area cases. When an exec smells the possibility that their team could have somehow avoided a problem, that’s category 1 and the hammer comes down hard.

After that process comes the BS and PR step, where reality is spun into a cotton candy that makes the leader look good no matter what.


> It took me so long to realise this is what's important in enterprise. Uptime isn't important, being able to blame someone else is what's important.

Yes.

What is important is having a Contractual SLA that is defensible. Acts of God are defensible. And now major cloud infrastructure outtages are too.


“No one ever got fired for hiring IBM”


Sometimes we all need a tech shutdown.


As they say, every cloud outage has a silver lining.

* Give the computers a rest, they probably need it. Heck, maybe the Internet should just shut down in the evening so everyone can go to bed (ignoring those pesky timezone differences)

* Free chaos engineering at the cloud provider region scale, except you didn't opt in to this one and know about in advance, making it extra effective

* Quickly figure out a map which of the things you use have a dependency on a single AWS region without no capability to change or re-route


Back in the day people used to shut down mail servers at the weekend, maybe we should start doing that again.


This still happens in some places. In various parts of Europe there are legal obligations not to email employees out of hours if it is avoidable. Volkswagen famously adopted a policy in Germany of only enabling receipt of new email messages for most of their employees 30 minutes before start of the working day, then disabling 30 minutes after the end, with weekends turned off also. You can leave work on Friday and know you won't be receiving further emails until Monday.

> https://en.wikipedia.org/wiki/Right_to_disconnect


B&H shuts down their site for the sabbath.


Disconnect day


I was once told that our company went with Azure because when you tell the boomer client that our service is down because Microsoft had an outage, they go from being mad at you, to accepting that the outage was an act of god that couldn’t be avoided.


Azure outages: happens all the time, understandable, no way to prevent this

AWS outages: almost never happens, you should have been more prepared for when it does


The 50 year old executive using the software doesn’t know what an AWS is and hardly knows what Amazon does outside of selling junk.

If you say it’s Microsoft then it’s just unavoidable.


I am down with that lets all build in US-East-1.


Is us-east-1 equally unstable to the other regions? My impression was that Amazon deployed changes to us-east-1 first so it's the most unstable region.


I've heard this so many times and not seen it contradicted so I started saying it myself. Even my last Ops team wanted to run some things in us-east-1 to get prior warning before they broke us-west-1.

But there are some people on Reddit who think we are all wrong but won't say anything more. So... whatever.

Nothing in the outage history really stands out as "this is the first time we tried this and oops" except for us-east-1.

It's always possible for things to succeed at a smaller scale and fail at full scale, but again none of them really stand out as that to me. Or at least, not any in the last ten years. I'm allowing that anything older than that is on the far side of substantial process changes and isn't representative anymore.


Would think that Amazon safeguards their biggest region more, but no idea, I've never worked at AWS


I do know from previous discussions that some companies are in us-east-1 because of business partnerships with other inhabitants and if one moves out the costs and latency goes up. So they are all stuck in this boat together.

Still, it would make a bit of sense if you can find a place in your code where crossing a region hurts less, to move some of your services to a different region.

While your business partners will understand that you’re down while they’re down, will your customers? You called yesterday to say their order was ready, and now they can’t pick it up?


And all your dependencies are co-located.


Doing pretty well up here in Tokyo region for now! Just can't log into console and some other stuff.


Check the URL, we had an issue a couple of years ago with the Workspaces. US East was down but all of our stuff was in EU.

Turns out the default URL was hardcoded to use the us east interface and just by going to workspaces and then editing your URL to be the local region got everyone working again.

Unless you mean nothing is working for you at the moment.


Doesn't this mean you are not regionally isolated from us-east-1?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: