Hacker Newsnew | past | comments | ask | show | jobs | submit | mprime1's commentslogin

> 5.9k stars, 727 forks, 132 contributors: https://github.com/nats-io/nats.go

That's the NATS Go _client_.

The server project is https://github.com/nats-io/nats-server 17k stars, 1.5k forks, 160 contributors


Look at the contribution history, basically all active contributors work for Synadia: https://github.com/nats-io/nats-server/graphs/contributors

That's not a healthy / functioning open source community. Less than 30 people have made more than 10 commits; most of the 160 were "drive by" who fixed a single small thing.


Good point!


On the face of it, that sounds healthy. But dig in further, and you discover that only the top 10 or so contributors have enough activity to make their personal contribution graph anything other than a flat line on zero.

The vast majority of the work here is being done by a very small group of people - likely those paid by the commercial sponsor (a random sample of the top few suggest that well over half of those top contributors fall into that category).

If those people aren't paid to work on the project any more, it will likely die very quickly.


> On the face of it, that sounds healthy. But dig in further, and you discover that only the top 10 or so contributors have enough activity to make their personal contribution graph anything other than a flat line on zero.

This is something I always do manually and it is annoying. The number of contributors we see on the front page of a project on Github is misleading and can easily be faked by a malicious actor because a user who corrected a typo 5 years ago is counted as a contributor.

Github should really improve that.


Mastodon is another one (ActivityPub / Feediverse)

https://nlnet.nl/project/Mastodon/


> never really understood how just adding extra invisible stuff to make the equations work is justified

IANAP but here’s my understanding.

At the end of the month you spent $2000, you’re not sure how so you track down your expenses:

  - rent $500
  - groceries $120
  - gas $80
  - …
  - unknown: $123
That ‘unknown’ is dark matter. It’s a placeholder. It’s there and makes your total but you can’t explain it yet.


What if dark matter is actually 10 different items? Or you have rent and gas wrong, or rent and gas inexplicably depend on each other in some way?


My anecdotal evidence confirms this. Friends in my home country are hooked on Telegram meme groups and as a result they’re spreading [Russian propaganda] conspiracy theories at alarming levels.

(Russia is really good at weaponizing memes but I don’t meant to single it out, the US has also been very successfully influencing the same country for decades via Hollywood movies for example)


Stay humble

  - There is someone else that is better at X than you are
  - The person not so good at X is probably better than you at Y


Indeed. Borg for example is e2e but able to dedupe.

My bookmark archive is 10TB but deduped on-disk size is 100GB because most files are the same across backups!

https://www.borgbackup.org/


That’s not the same thing at all.


Same thing as what?

Parent was asking about deduping encrypted data.

Someone said (wrongly) it’s impossible and I shared a popular project that does exactly that.


“Not backing up the same file twice” is not the same thing as deduplicating encrypted data, as encryption has no relevance there. You can do that with or without encryption.


I had a fascinating conversation with someone that has been working on this system for the last 20 or so years.

Water treatment is powered by the geyser and in turn the leftover brown water feeds the geyser.

Pretty neat (for sewage)!


Telling people "here's a recipe to do locks" should come with a giant flashing sign that say: "this is not an actual lock (as in in-process locks) -- locks in distributed systems are impossible, this cannot be used as a recipe for mutual exclusion"


Thank you for the link GP!

I’m the ‘maintainer’ but I’m hands off and not planning on significant improvements.

Discussion on HN was also quite interesting and you may find some ideas: https://news.ycombinator.com/item?id=34083366

I also recently presented this at HOPE(.net) and was very well received by a technical crowd so congrats on independently inventing the same thing ;-)


Leader election and distributed locking reduce to the same problem… which is proven to be impossible. It means in some edge case it will fail on you, is your system handling those cases?

I didn’t read past this:

> Systems like Apache ZooKeeper or Postgres (via Advisory Locks) provide the required building blocks for this

Zookeeper is the original sin. Convincing a whole generation of programmers that distributed lock are a feasible solution.

This is my biggest pet peeve in distributed systems.

——

And if you don’t believe me, maybe you’ll trust Kyle K. of Jepsen fame:

> However, perfect failure detectors are impossible in asynchronous networks.

Links to: https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p225-...

https://jepsen.io/analyses/datomic-pro-1.0.7075


> which is proven to be impossible.

'Technically' intractable problems are solvable just fine in a way that is almost as useful as solving them completely if you can achieve one of two things:

* Reliably identify when you've encountered an unsolvable case (usefulness of this approach depends on the exact problem you're solving).

or

* Reduce the probability of unsolvable cases/incorrect solutions to a level low enough to not actually happen in practice.

'Technically' GUIDs are impossible, reliable network communication (TCP) is impossible, O^2 time complexity functions will grow to unusably large running times - but in practice all of these things are used constantly to solve real problems.


"Distributed locks" are at best a contention-reduction mechanism. They cannot be used to implement mutual exclusion that is _guaranteed_ to work.

I've seen way too many systems where people assume TCP == perfectly reliable and distributed locks == mutual exclusion. Which of course it's not the case.


> Convincing a whole generation of programmers that distributed lock are a feasible solution.

I too hate this. Not just because the edge cases exist, but also because of the related property: it makes the system very hard to reason about.

Questions that should be simple become complicated. What happens when the distributed locking system is down? What happens when we reboot all the nodes at once? What if they don't come down at exactly the same time and there's leader churn for like 2 minutes? Etc, etc.

Those questions should be fairly simple, but become something where a senior dev is having to trace codepaths and draw on a whiteboard to figure it out. It's not even enough to understand how a single node works in-depth, they have to figure out how this node works but also how this node's state might impact another node's.

All of this is much simpler in leaderless systems (where the leader system is replaced with idempotency or a scheduler or something else).

I very strongly prefer avoiding leader systems; it's a method of last resort when literally nothing else will work. I would much rather scale a SQL database to support the queries for idempotency than deal with a leader system.

I've never seen an idempotent system switch to a leader system, but I've sure seen the reverse a few times.


>> Convincing a whole generation of programmers that distributed lock are a feasible solution.

> I too hate this. Not just because the edge cases exist, but also because of the related property: it makes the system very hard to reason about.

I think this is a huge problem with the way we’re developing software now. Distributed systems are extremely difficult for a lot of reasons, yet it’s often or first choice when developing even small systems!

At $COMPANY we have hundreds of lambdas, DocumentDB (btw, that is hell in case you’re considering it) and other cloud storage and queuing components. On call and bugs basically are quests in finding some corner case race condition/timing problem, read after write assumption etc.

I’m ashamed to say, we have reads wrapped in retry loops everywhere.

The whole thing could have been a Rails app with a fraction of the team size and a massive increase in reliability and easier to reason about/better time delivering features.

You could say we’re doing it wrong, and you’d probably be partly right for sure, but I’ve done consulting for a decade at dozens of other places and it always seems like this.


> You could say we’re doing it wrong, and you’d probably be partly right for sure, but I’ve done consulting for a decade at dozens of other places and it always seems like this.

The older I get, the more I think this is a result of Conway's law and that a lot of this architectural cruft stems from designing systems around communication boundaries rather than things that make technical sense.

Monolithic apps like Rails only happen under a single team or teams that are so tightly coupled people wonder whether they should just merge.

Distributed apps are very loosely coupled, so it's what you would expect to get from two teams that are far apart on the org chart.

Anecdotally, it mirrors what I've seen in practice. Closely related teams trust each other and are willing to make a monolith under an assumption that their partner team won't make it a mess. Distantly related teams play games around ensuring that their portion is loosely coupled enough that it can have its own due dates, reliability, etc.

Queues are the king of distantly coupled systems. A team's part of a queue-based app can be declared "done" before the rest of it is even stood up. "We're dumping stuff into the queue, they just need to consume it" or the inverse "we're consuming, they just need to produce". Both sides of the queue are basically blind to each other. That's not to say that all queues are bad, but I have seen a fair few queues that existed basically just to create an ownership boundary.

I once saw an app that did bidirectional RPC over message queues because one team didn't believe the other could/would do retries, on an app that handled single digit QPS. It still boggles my mind that they thought it was easier to invent a paradigm to match responses to requests than it was to remind the other team to do retries, or write them a library with retries built in, or just participate in bleeping code reviews.


> once saw an app that did bidirectional RPC over message queues

Haha I've seen this anti-pattern too (although I think it's in the enterprise patterns book??). It would bring production to a grinding halt every night. Another engineer and I stayed up all night and replaced it with simple REST API.


I once saw a REST API built with bidirectional queues. There was a “REST” server that converted HTTP to some weird custom format and an “app” server with “business logic”, with tons of queues in between. It was massively over complicated and never made it to production. I won’t even describe what the database looked like.


I see the same. All this complexity to handle a few requests/second... but at least we can say it's "cloud native."


Same thing where I work now. Many experienced developers waste a huge chunk of their time trying to wrap their heads around their Django micro services communication patterns and edge cases. Much more complex than an equivalent Rails monolith, even though Ruby and Rails both have their issues and could be replaced by more modern tech in 2024.


This is rather misleading, the FLP theorem talks about fully asynchronous networks with unbounded delay. Partial synchrony is a perfectly reasonable assumption and allows atomic broadcast and locking to work perfectly well even if there is an unknown but finite bound on network delay.


Notice I did not mention FLP.

Atomic Broadcast (via Paxos or RAFT) does not depend on partial synchrony assumptions to maintain its safety properties.

Your internet or intranet networks are definitely asynchronous and assuming delays are bound is a recipe for building crappy systems that will inevitably fail on you in hard to debug ways.


Had you read on, you'd have seen that I am discussing this very point:

> leader election will only ever be eventually correct... So you’ll always need to be prepared to detect and fence off work done by a previous leader.


I'm sorry, I didn't mean to be bashful. I am not familiar with S3 and maybe what you describe is a perfectly safe solution for S3 and certain classes of usage.

I could not get past the point where you promulgate the idea that ZK can be used to implement locks.

Traditionally a 'lock' guarantees mutual exclusion between threads or processes.

"Distributed locks" are not locks at all. They look the same from API perspective, but they have much weaker properties. They cannot be used to guarantee mutual exclusion.

I think any mention of distributed locks / leader election should come with a giant warning: THESE LOCKS ARE NOT AS STRONG AS THE ONES YOU ARE USED TO. Skipping this warning is doing a disservice to your readers.


Reliable network communication is also proven to be impossible [1], yet it happens all the time. Yes, sometime it fails but it still “works”.

[1] https://en.wikipedia.org/wiki/Two_Generals%27_Problem


There's some serious flaws in your reasoning.

TCP guarantees order [as long as the connection is active] but it is far from being 'perfectly reliable'.

Example: sender sends, connection drops, sender has no idea whether the receiver received.

In other words, it works until it doesn't. The fact that sometimes it doesn't means it's not perfect.

TCP is a great tool but it doesn't violate the laws of physics. The 2 generals problems is and will always be impossible.


Yes, but the vast majority of network traffic these days is TCP and very, very rarely does that cause a problem because applications already need to have logic to handle failures which cannot be solved at the transport level. There is a meaningful difference between theoretically perfect and close enough to build even enormous systems with high availability.


Rounding up 'usually works' to 'is reliable' is a recipe for building crappy systems.

Rounding down 'usually works' to 'it's not perfect and we need to handle edge cases' is how you build dependable systems.

Your first comment seemed very much in the first camp to me.


Your second camp is the latter half of my first sentence. As a simple example, the transport layer cannot prevent a successfully-received message from being dropped by an overloaded or malfunctioning server, duplicate transmissions due to client errors, etc. so most applications have mechanisms to indicate status beyond simple receipt, timeouts to handle a wide range of errors only some of which involve the transport layer, and so forth. Once you have that, most applications can tolerate the slight increase in TCP failures which a different protocol would prevent.


> sometime it fails

That failures are a possibility does not concern me. That the failures are not fully characterized does.


I literally had this argument with David Schwartz of Ripple about our lack of a consensus protocol.


They both reduce to a paxos style atomic broadcast, which is in fact possible although the legend is that Leslie Lamport was trying to prove it impossible and accidentally found a way.


> They both reduce to a paxos style atomic broadcast

Atomic Broadcast guarantees order of delivery. It does not (cannot) guarantee timing of delivery. Which is what people want and expect when using distributed lock / leader election.


Ordering gives you a leader or lock holder, first claimant in the agreed ordering wins.

If you're saying "what if everything's down and we never get responses to our leadership bids", then yeah, the data center could burn down or we could lose electricity, too.


Ordering gives you ... ordering. And nothing more.

Process 1 receives: 3:00 [1] "P1 is leader" 3:01 [2] "P2 is leader"

Process 2 receives: 3:00 [1] "P1 is leader" 4:00 [2] "P2 is leader"

This is perfectly valid Atomic Broadcast. Order is maintained.

However from 3:01 to 4:00PM you have 2 leaders (or 2 processes holding the lock).

Don't use ABCast to do locking / leader election for your "user-space" application!


Good point, thanks


I remember at Oracle they built systems to shut down the previous presumed leader to definitively know it wasn't ghosting.


Yep, the "STONITH" technique [1]. But programmatically resetting one node over a network/RPC call might not work, if internode-network comms are down for that node, but it can still access shared storage via other networks... The Oracle's HA fencing doc mentions other methods too, like IPMI LAN fencing and SCSI persistent reservations [2].

[1] https://en.wikipedia.org/wiki/STONITH

[2] https://docs.oracle.com/en/operating-systems/oracle-linux/8/...


They had access to the ILOM and had some much more durable way to STONITH. Of course every link can "technically" fail but it brought it to some unreasonable amount of 9s that it felt unwarranted to consider.


Yep and ILOM access probably happens over the management network and can hardware-reset the machine, so the dataplane internode network issues and any OS level brownouts won't get in the way.


Indeed a slow node looks like a dead node for a while until it isn’t.

At some point distributed systems that work well vs others that do not is an “art of tuning timeouts and retries”.

Also nothing in production is perfect - so we should consider failures always when writing code in distributed systems and the impacts.

And we will still make mistakes…


> …which is proven to be impossible

For some definition of impossible, given that many systems utilise them effectively. Not all corner cases or theoretical failure modes are relevant to everyone.


Many systems are buggy, periodically running into corner cases that are borderline impossible to debug.


Sometimes it's enough to detect these cases and reset the system to its stable state.

For example bicycle wheels are bistable systems, but they usually stay in their useful state so it does not matter in practice.


Yes, and yet those systems still work and deliver real value all day every day. If every company Rollbar I've ever seen is the measure good software can have millions of faults and still work for users.


Can you point me in the right direction to understand Zookeepers fatal flaw?

I know you linked the first paper, but yeah candidly I don’t wish to read the full paper

Don’t mean this sarcastically whatsoever. Genuine interest.


The fatal flaw is calling them "locks".

When programmers think of locks, they think of something that can be used to guarantee mutual exclusion.

Distributed locks have edge cases where mutual exclusion is violated.

Implementation does not matter.

e.g. imagine someone shows you a design for a perpetual motion machine. You don't need to know the details to know it doesn't work! It would violate the laws of physics!

Similarly, anyone telling you they created an implementation of a distributed lock that is safe, is claiming their system breaks the laws of information theory.

"Distributed locks" are at best contention-reduction mechanisms. i.e. they can keep multiple processes from piling up and slowing each other down.

[Some] Paxos for example use leader election to streamline the protocol and achieve high throughput. But Paxos safety does NOT depend on it. If there are multiple leaders active (which will inevitably happen), the protocol still guarantees its safety properties.


For me, I almost stopped reading at the assertion that clock drift doesn't matter. They clearly didn't think through the constant fight that would occur over who the leader actually was and just hand-wave it away as 'not an issue.' They need to remove time from their equation completely if they want clock drift to not matter.


part of me thinks that clock drift would be reliably biased toward a particular node, not leapfrogging between two nodes.


It’s deeper than that. See the paper on time in distributed systems by Lamport: https://lamport.azurewebsites.net/pubs/time-clocks.pdf

There’s a relativity-like issue where it’s impossible to have a globally consistent view of time.

See IR2 for how to synchronize physical time in a distributed system.

“(a) If Pg sends a message m at physical time t, then m contains a timestamp Tm= C/(t). (b) Upon receiving a message m at time t', process P/ sets C/(t') equal to maximum (Cj(t' - 0), Tm + /Zm).”

I think the formula is saying that the clocks will only ever increase (i.e. drift upwards). If so, then you could imagine two processes leapfrogging if one of them sends a message that bumps the other’s clock, then that one sends a message back that bumps the first.

But I’m curious how it behaves if one of the clocks is running faster, e.g. a satellite has a physically different sense of time than an observer on the ground.

Also note the paper claims you can’t get rid of clocks if you want to totally order the events.


It's a truly fantastic paper, but like many things the idea that it's impossible to have a perfectly consistent global view of time doesn't mean it's not possible to have a near-perfect one. The blog mentioned AWS's Timesync which lets you be synchronized into the microseconds (https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-ti...), and Google's TrueTime is used to give ranges of times that you're guaranteed to be within (https://cloud.google.com/spanner/docs/true-time-external-con...).


It's near-perfect ... until it isn't and everything goes to shit in a cascading wtafc (what-the-actual-fuck-catastrophe).

1ms is an extremely long time in computer land where decisions are made in nanoseconds. There are 1000 nanoseconds per millisecond.


1000000 nanoseconds per millisecond.


Think you mean micro rather than milli


Yes thank you!


> Also note the paper claims you can’t get rid of clocks if you want to totally order the events.

The order of events isn't important here. We only care about the 'last event' which everyone can agree on; they just can't agree on when it happened.

In other words, they can all agree that there is a leader; we just don't know if that leader is still alive or not. The best thing to do is simply make the leader election deterministically random:

1. 'seed' the file with a 'no leader' file.

2. at randomly short intervals (max latency you want to deal with, so like once every few seconds or so), try to claim the file by using the hash of the current file as your etag. The winner is the leader.

3. Once there is a leader and you are the leader, every N seconds, update the file with a new number, using the hash of the current file as the etag.

4. If you are not the leader, every N*(random jitter)*C (where C>N*2, adjust for latency), attempt to take the file using the same method above. If you fail, retry again in N*(random jitter)*C.

5. If the leader is taken, you are not the leader until you've held leadership for at least N seconds.

This doesn't necessarily remove the clock. However, most -- if not all -- modern clocks will agree that the length of a second is within a few nanoseconds of any other clock, regardless of how accurate its total count of seconds is since the epoch. It also probably doesn't work, but it probably isn't that far away from one that does.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: