Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

See quectophoton's comment—the requirement that continuation bytes are always tagged with a leading 10 is useful if a parser is jumping in at a random offset—or, more commonly, if the text stream gets fragmented. This was actually a major concern when UTF-8 was devised in the early 90s, as transmission was much less reliable than it is today.


Addendum: This was posted to the front page today: https://doc.cat-v.org/bell_labs/utf-8_history

It also notes that UTF-8 protects against the dangers of NUL and '/' appearing in filenames, which would kill C strings and DOS path handling, respectively.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: