Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yep, you're right. Those two bytes are forbidden to prevent overlong encodings. A number of multibyte sequences are forbidden for the same reason too.

A true flaw of UTF-8 in the long run. They should have biased the values of multibyte sequences to remove redundant encodings.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: