Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^22: Seeking Perl docs about how UTF8 flag propagates (Terminology)

by ikegami (Patriarch)
on May 23, 2023 at 14:52 UTC ( [id://11152392]=note: print w/replies, xml ) Need Help??


in reply to Re^21: Seeking Perl docs about how UTF8 flag propagates (Terminology)
in thread Seeking Perl docs about how UTF8 flag propagates

Oh, you have a problem with the fact that you can store a byte in a character.

A character can be:

  1. Smallest addressable unit. Literally a synonym for byte.
  2. Element of the string.
  3. Grapheme
  4. Glyph
  5. Code point

In Perl, it has the second definition. There are no other words for this.

You apparently associate character with one of the last three. I don't know which.

For example, take a look at Å [U+212B], Å [U+C5] and Å [U+41,U+30A].

  • They are all the same glyph, but the first one has a different grapheme.
  • The last two are the same grapheme, but all use different code points.

So when you say character, do you think that all three of those things are the same? Only two? None of them? I have no idea. Unicode suggests most people would consider that list to have two characters: The Armstrong symbol, and Latin Capital Letter A with Ring Above. But most people isn't everyone. And that's why you should use the more precise term than character if you mean grapheme, glyph or code point. Standards exist for a reason.

  • Comment on Re^22: Seeking Perl docs about how UTF8 flag propagates (Terminology)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11152392]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2024-04-25 15:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found