208
submitted 2 months ago by throws_lemy@lemmy.nz to c/linux@programming.dev
you are viewing a single comment's thread
view the rest of the comments
[-] MarekKnapek@programming.dev 6 points 2 months ago

No! One code point could be encoded by up to 4 UTF-8 code units, not glyph. Glyphs do not map to code points one to one. One glyph could be encoded by more than one code point (and each code point could be encoded by more than one code unit). Code points are Unicode thing, code units are Unicode encoding thing, glyphs are font+Unicode thing. For example the glyph á might be single code point or two code points. Single code point because this is common letter in some languages, and was used in computers before Unicode was invented, two code points because this might be the base letter a followed by an diacritic combining mark. Not all diacritic letters have single code point variant. Also emojis, they are single glyph but multiple code points, for example skin tone modifier for various faces emojis, or male+female characters combined into single glyph forming a family glyph. Also country flags are single glyph, but multiple code points. Unicode is BIG, there are A LOT of stuff in it. For example sorting based on users language, conversion to upper/lower case is also not trivial (google the turkish i).

this post was submitted on 16 Nov 2025
208 points (98.6% liked)

Linux

12268 readers
446 users here now

A community for everything relating to the GNU/Linux operating system (except the memes!)

Also, check out:

Original icon base courtesy of lewing@isc.tamu.edu and The GIMP

founded 2 years ago
MODERATORS