Oh god, this again. One word: "History". No one thought we would need more than ...

ninkendo · 2025-11-25T14:12:55 1764079975

The tragic part is how brief the period of time was between “ascii and a mess of code pages” and the problem actually getting solved with Unicode 2.0 and UTF-8.

Unicode 1.0 was in 1991, UTF-8 happened a year later, and Unicode 2.0 (where more than 65,536 characters became “official”, and UTF-8 was the recommended choice) was in 1996.

That means if you were green-fielding a new bit of tech in 1991, you likely decided 16 bits per character was the correct approach. But in 1992 it started to become clear that maybe a variable with encoding (with 8 bits as the base character size) was on the horizon. And by 1996 it was clear that fixed 16-bit characters was a mistake.

But that 5-year window was an extremely critical time in computing history: Windows NT was invented, so was Java, JavaScript, and a bunch of other things. So, too late, huge swaths of what would become today’s technical landscape had set the problem in stone.

UNIXes only use the “right” technical choice because it was already too hard to move from ASCII to 16-bit characters… but laziness in moving off of ASCII ultimately paid off as it became clear that 16-bits per character was the wrong choice in the first place. But otherwise UNIX would have had the same fate.

aallaall · 2025-11-25T15:39:05 1764085145

For a while the brain dead utf32 encoding was popular in the Unix/Linux world.

throwaway2037 · 2025-11-25T23:16:15 1764112575

Exactly: Long live "wchar_t".