> WALs, and related low-level logging details, are critical for database systems that care deeply about durability on a single system. But the modern database isn’t like that: it doesn’t depend on commit-to-disk on a single system for its durability story. Commit-to-disk on a single system is both unnecessary (because we can replicate across storage on multiple systems) and inadequate (because we don’t want to lose writes even if a single system fails).
And then a bug crashes your database cluster all at once and now instead of missing seconds, you miss minutes, because some smartass thought "surely if I send request to 5 nodes some of that will land on disk in reasonably near future?".
I love how this industry invents best practices that are actually good then people just invent badly researched reasons to just... not do them.
But we know this is not actually robust because storage and power failures tend to be correlated. The most recent Jepsen analysis again highlights that it's flawed thinking: https://jepsen.io/analyses/nats-2.12.1
The Aurora paper [0] goes into detail of correlated failures.
> In Aurora, we have chosen a design point of tolerating (a) losing
an entire AZ and one additional node (AZ+1) without losing data,
and (b) losing an entire AZ without impacting the ability to write
data. [..] With such a model, we can (a) lose a
single AZ and one additional node (a failure of 3 nodes) without
losing read availability, and (b) lose any two nodes, including a
single AZ failure and maintain write availability.
As for why this can be considered durable enough, section 2.2 gives an argument based on their MTTR (mean time to repair) of storage segments
> We would need to see two
such failures in the same 10 second window plus a failure of an
AZ not containing either of these two independent failures to lose
quorum. At our observed failure rates, that’s sufficiently unlikely,
even for the number of databases we manage for our customers.
The biggest lie we’ve been told is that databases require global consistency and a global clock. Traditional databases are still operating with Newtonian assumptions about absolute time, while the real world moves according to Einstein’s relativistic theory, where time is local and relative. You dont need global order, you dont need global clock.
You need a clock but you can have more than one. This is an important distinction.
Arbitrating differences in relative ordering across different observer clocks is what N-temporal databases are about. In databases we usually call the basic 2-temporal case “bitemporal”. The trivial 1-temporal case (which is a quasi-global clock) is what we call “time-series”.
The complexity is that N-temporality turns time into a true N-dimensional data type. These have different behavior than the N-dimensional spatial data types that everyone is familiar with, so you can’t use e.g. quadtrees as you would in the 2-spatial case and expect it to perform well.
There are no algorithms in literature for indexing N-temporal types at scale. It is a known open problem. That’s why we don’t do it in databases except at trivial scales where you can just brute-force the problem. (The theory problem is really interesting but once you start poking at it you quickly see why no one has made any progress on it. It hurts the brain just to think about it.)
Till the financial controller shows up at the very least.
Also even if not required makes reasoning about how systems work a hell lot easier. So for vast majority that doesn't need massive throughtputs sacrificing some speed for easier to understand consistency model is worthy tradeoff
Prety much all financial transactions are settled with a given date, not instantly.
Go sell some stocks, it takes 2 days to actually settle. (May be hidden by your provider, but that how it works).
For that matter, the ultimate in BASE for financial transactions is the humble check.
That is a great example of "money out" that will only be settled at some time in the future.
There is a reason there is this notion of a "business day" and re-processing transactions that arrived out of order.
The deeper problem isnt global clocks or even strict consistency, it’s the assumption that synchronous coordination is the default mechanism for correctness.That’s the real Newtonian mindset, a belief that serialization must happen before progress is allowed. Synchronous coordination can enforce correctness, but it should not be the only mechanism to achieve it. Physics actually teaches the opposite assumption, time is relative and local, not globally ordered. Yet traditional databases were designed as if absolute time and global serialization were fundamental laws, rather than conveniences.We treat global coordination as inevitable when it’s really just a historical design choice, not a requirement for correctness.
Happens all the time (the ignores best practices because it’s convenient or ‘just because’ to do something different), literally everywhere including normal society.
And then a bug crashes your database cluster all at once and now instead of missing seconds, you miss minutes, because some smartass thought "surely if I send request to 5 nodes some of that will land on disk in reasonably near future?".
I love how this industry invents best practices that are actually good then people just invent badly researched reasons to just... not do them.