That's not a healthy / functioning open source community. Less than 30 people have made more than 10 commits; most of the 160 were "drive by" who fixed a single small thing.
On the face of it, that sounds healthy. But dig in further, and you discover that only the top 10 or so contributors have enough activity to make their personal contribution graph anything other than a flat line on zero.
The vast majority of the work here is being done by a very small group of people - likely those paid by the commercial sponsor (a random sample of the top few suggest that well over half of those top contributors fall into that category).
If those people aren't paid to work on the project any more, it will likely die very quickly.
> On the face of it, that sounds healthy. But dig in further, and you discover that only the top 10 or so contributors have enough activity to make their personal contribution graph anything other than a flat line on zero.
This is something I always do manually and it is annoying. The number of contributors we see on the front page of a project on Github is misleading and can easily be faked by a malicious actor because a user who corrected a typo 5 years ago is counted as a contributor.
My anecdotal evidence confirms this. Friends in my home country are hooked on Telegram meme groups and as a result they’re spreading [Russian propaganda] conspiracy theories at alarming levels.
(Russia is really good at weaponizing memes but I don’t meant to single it out, the US has also been very successfully influencing the same country for decades via Hollywood movies for example)
“Not backing up the same file twice” is not the same thing as deduplicating encrypted data, as encryption has no relevance there. You can do that with or without encryption.
Telling people "here's a recipe to do locks" should come with a giant flashing sign that say: "this is not an actual lock (as in in-process locks) -- locks in distributed systems are impossible, this cannot be used as a recipe for mutual exclusion"
Leader election and distributed locking reduce to the same problem… which is proven to be impossible. It means in some edge case it will fail on you, is your system handling those cases?
I didn’t read past this:
> Systems like Apache ZooKeeper or Postgres (via Advisory Locks) provide the required building blocks for this
Zookeeper is the original sin. Convincing a whole generation of programmers that distributed lock are a feasible solution.
This is my biggest pet peeve in distributed systems.
——
And if you don’t believe me, maybe you’ll trust Kyle K. of Jepsen fame:
> However, perfect failure detectors are impossible in asynchronous networks.
'Technically' intractable problems are solvable just fine in a way that is almost as useful as solving them completely if you can achieve one of two things:
* Reliably identify when you've encountered an unsolvable case (usefulness of this approach depends on the exact problem you're solving).
or
* Reduce the probability of unsolvable cases/incorrect solutions to a level low enough to not actually happen in practice.
'Technically' GUIDs are impossible, reliable network communication (TCP) is impossible, O^2 time complexity functions will grow to unusably large running times - but in practice all of these things are used constantly to solve real problems.
"Distributed locks" are at best a contention-reduction mechanism. They cannot be used to implement mutual exclusion that is _guaranteed_ to work.
I've seen way too many systems where people assume TCP == perfectly reliable and distributed locks == mutual exclusion. Which of course it's not the case.
> Convincing a whole generation of programmers that distributed lock are a feasible solution.
I too hate this. Not just because the edge cases exist, but also because of the related property: it makes the system very hard to reason about.
Questions that should be simple become complicated. What happens when the distributed locking system is down? What happens when we reboot all the nodes at once? What if they don't come down at exactly the same time and there's leader churn for like 2 minutes? Etc, etc.
Those questions should be fairly simple, but become something where a senior dev is having to trace codepaths and draw on a whiteboard to figure it out. It's not even enough to understand how a single node works in-depth, they have to figure out how this node works but also how this node's state might impact another node's.
All of this is much simpler in leaderless systems (where the leader system is replaced with idempotency or a scheduler or something else).
I very strongly prefer avoiding leader systems; it's a method of last resort when literally nothing else will work. I would much rather scale a SQL database to support the queries for idempotency than deal with a leader system.
I've never seen an idempotent system switch to a leader system, but I've sure seen the reverse a few times.
>> Convincing a whole generation of programmers that distributed lock are a feasible solution.
> I too hate this. Not just because the edge cases exist, but also because of the related property: it makes the system very hard to reason about.
I think this is a huge problem with the way we’re developing software now. Distributed systems are extremely difficult for a lot of reasons, yet it’s often or first choice when developing even small systems!
At $COMPANY we have hundreds of lambdas, DocumentDB (btw, that is hell in case you’re considering it) and other cloud storage and queuing components. On call and bugs basically are quests in finding some corner case race condition/timing problem, read after write assumption etc.
I’m ashamed to say, we have reads wrapped in retry loops everywhere.
The whole thing could have been a Rails app with a fraction of the team size and a massive increase in reliability and easier to reason about/better time delivering features.
You could say we’re doing it wrong, and you’d probably be partly right for sure, but I’ve done consulting for a decade at dozens of other places and it always seems like this.
> You could say we’re doing it wrong, and you’d probably be partly right for sure, but I’ve done consulting for a decade at dozens of other places and it always seems like this.
The older I get, the more I think this is a result of Conway's law and that a lot of this architectural cruft stems from designing systems around communication boundaries rather than things that make technical sense.
Monolithic apps like Rails only happen under a single team or teams that are so tightly coupled people wonder whether they should just merge.
Distributed apps are very loosely coupled, so it's what you would expect to get from two teams that are far apart on the org chart.
Anecdotally, it mirrors what I've seen in practice. Closely related teams trust each other and are willing to make a monolith under an assumption that their partner team won't make it a mess. Distantly related teams play games around ensuring that their portion is loosely coupled enough that it can have its own due dates, reliability, etc.
Queues are the king of distantly coupled systems. A team's part of a queue-based app can be declared "done" before the rest of it is even stood up. "We're dumping stuff into the queue, they just need to consume it" or the inverse "we're consuming, they just need to produce". Both sides of the queue are basically blind to each other. That's not to say that all queues are bad, but I have seen a fair few queues that existed basically just to create an ownership boundary.
I once saw an app that did bidirectional RPC over message queues because one team didn't believe the other could/would do retries, on an app that handled single digit QPS. It still boggles my mind that they thought it was easier to invent a paradigm to match responses to requests than it was to remind the other team to do retries, or write them a library with retries built in, or just participate in bleeping code reviews.
> once saw an app that did bidirectional RPC over message queues
Haha I've seen this anti-pattern too (although I think it's in the enterprise patterns book??). It would bring production to a grinding halt every night. Another engineer and I stayed up all night and replaced it with simple REST API.
I once saw a REST API built with bidirectional queues. There was a “REST” server that converted HTTP to some weird custom format and an “app” server with “business logic”, with tons of queues in between. It was massively over complicated and never made it to production. I won’t even describe what the database looked like.
Same thing where I work now. Many experienced developers waste a huge chunk of their time trying to wrap their heads around their Django micro services communication patterns and edge cases. Much more complex than an equivalent Rails monolith, even though Ruby and Rails both have their issues and could be replaced by more modern tech in 2024.
This is rather misleading, the FLP theorem talks about fully asynchronous networks with unbounded delay. Partial synchrony is a perfectly reasonable assumption and allows atomic broadcast and locking to work perfectly well even if there is an unknown but finite bound on network delay.
Atomic Broadcast (via Paxos or RAFT) does not depend on partial synchrony assumptions to maintain its safety properties.
Your internet or intranet networks are definitely asynchronous and assuming delays are bound is a recipe for building crappy systems that will inevitably fail on you in hard to debug ways.
I'm sorry, I didn't mean to be bashful. I am not familiar with S3 and maybe what you describe is a perfectly safe solution for S3 and certain classes of usage.
I could not get past the point where you promulgate the idea that ZK can be used to implement locks.
Traditionally a 'lock' guarantees mutual exclusion between threads or processes.
"Distributed locks" are not locks at all.
They look the same from API perspective, but they have much weaker properties. They cannot be used to guarantee mutual exclusion.
I think any mention of distributed locks / leader election should come with a giant warning: THESE LOCKS ARE NOT AS STRONG AS THE ONES YOU ARE USED TO.
Skipping this warning is doing a disservice to your readers.
Yes, but the vast majority of network traffic these days is TCP and very, very rarely does that cause a problem because applications already need to have logic to handle failures which cannot be solved at the transport level. There is a meaningful difference between theoretically perfect and close enough to build even enormous systems with high availability.
Your second camp is the latter half of my first sentence. As a simple example, the transport layer cannot prevent a successfully-received message from being dropped by an overloaded or malfunctioning server, duplicate transmissions due to client errors, etc. so most applications have mechanisms to indicate status beyond simple receipt, timeouts to handle a wide range of errors only some of which involve the transport layer, and so forth. Once you have that, most applications can tolerate the slight increase in TCP failures which a different protocol would prevent.
They both reduce to a paxos style atomic broadcast, which is in fact possible although the legend is that Leslie Lamport was trying to prove it impossible and accidentally found a way.
> They both reduce to a paxos style atomic broadcast
Atomic Broadcast guarantees order of delivery.
It does not (cannot) guarantee timing of delivery.
Which is what people want and expect when using distributed lock / leader election.
Ordering gives you a leader or lock holder, first claimant in the agreed ordering wins.
If you're saying "what if everything's down and we never get responses to our leadership bids", then yeah, the data center could burn down or we could lose electricity, too.
Yep, the "STONITH" technique [1]. But programmatically resetting one node over a network/RPC call might not work, if internode-network comms are down for that node, but it can still access shared storage via other networks... The Oracle's HA fencing doc mentions other methods too, like IPMI LAN fencing and SCSI persistent reservations [2].
They had access to the ILOM and had some much more durable way to STONITH.
Of course every link can "technically" fail but it brought it to some unreasonable amount of 9s that it felt unwarranted to consider.
Yep and ILOM access probably happens over the management network and can hardware-reset the machine, so the dataplane internode network issues and any OS level brownouts won't get in the way.
For some definition of impossible, given that many systems utilise them effectively. Not all corner cases or theoretical failure modes are relevant to everyone.
Yes, and yet those systems still work and deliver real value all day every day. If every company Rollbar I've ever seen is the measure good software can have millions of faults and still work for users.
When programmers think of locks, they think of something that can be used to guarantee mutual exclusion.
Distributed locks have edge cases where mutual exclusion is violated.
Implementation does not matter.
e.g. imagine someone shows you a design for a perpetual motion machine. You don't need to know the details to know it doesn't work! It would violate the laws of physics!
Similarly, anyone telling you they created an implementation of a distributed lock that is safe, is claiming their system breaks the laws of information theory.
"Distributed locks" are at best contention-reduction mechanisms. i.e. they can keep multiple processes from piling up and slowing each other down.
[Some] Paxos for example use leader election to streamline the protocol and achieve high throughput.
But Paxos safety does NOT depend on it. If there are multiple leaders active (which will inevitably happen), the protocol still guarantees its safety properties.
For me, I almost stopped reading at the assertion that clock drift doesn't matter. They clearly didn't think through the constant fight that would occur over who the leader actually was and just hand-wave it away as 'not an issue.' They need to remove time from their equation completely if they want clock drift to not matter.
There’s a relativity-like issue where it’s impossible to have a globally consistent view of time.
See IR2 for how to synchronize physical time in a distributed system.
“(a) If Pg sends a message m at physical time t, then m contains a timestamp Tm= C/(t). (b) Upon receiving a message m at time t', process P/ sets C/(t') equal to maximum (Cj(t' - 0), Tm + /Zm).”
I think the formula is saying that the clocks will only ever increase (i.e. drift upwards). If so, then you could imagine two processes leapfrogging if one of them sends a message that bumps the other’s clock, then that one sends a message back that bumps the first.
But I’m curious how it behaves if one of the clocks is running faster, e.g. a satellite has a physically different sense of time than an observer on the ground.
Also note the paper claims you can’t get rid of clocks if you want to totally order the events.
> Also note the paper claims you can’t get rid of clocks if you want to totally order the events.
The order of events isn't important here. We only care about the 'last event' which everyone can agree on; they just can't agree on when it happened.
In other words, they can all agree that there is a leader; we just don't know if that leader is still alive or not. The best thing to do is simply make the leader election deterministically random:
1. 'seed' the file with a 'no leader' file.
2. at randomly short intervals (max latency you want to deal with, so like once every few seconds or so), try to claim the file by using the hash of the current file as your etag. The winner is the leader.
3. Once there is a leader and you are the leader, every N seconds, update the file with a new number, using the hash of the current file as the etag.
4. If you are not the leader, every N*(random jitter)*C (where C>N*2, adjust for latency), attempt to take the file using the same method above. If you fail, retry again in N*(random jitter)*C.
5. If the leader is taken, you are not the leader until you've held leadership for at least N seconds.
This doesn't necessarily remove the clock. However, most -- if not all -- modern clocks will agree that the length of a second is within a few nanoseconds of any other clock, regardless of how accurate its total count of seconds is since the epoch. It also probably doesn't work, but it probably isn't that far away from one that does.
That's the NATS Go _client_.
The server project is https://github.com/nats-io/nats-server 17k stars, 1.5k forks, 160 contributors