Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^19: Seeking Perl docs about how UTF8 flag propagates (Terminology)

by LanX (Saint)
on May 22, 2023 at 20:48 UTC ( [id://11152368]=note: print w/replies, xml ) Need Help??


in reply to Re^18: Seeking Perl docs about how UTF8 flag propagates (Terminology)
in thread Seeking Perl docs about how UTF8 flag propagates

> Are you talking about something I said?

Rather about something you asked. see Re^15: Seeking Perl docs about how UTF8 flag propagates (Terminology)

> Anything larger than 0xF_FFFF_FFFF will take 13.

Thanks, interesting.

Cheers Rolf
(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
Wikisyntax for the Monastery

  • Comment on Re^19: Seeking Perl docs about how UTF8 flag propagates (Terminology)

Replies are listed 'Best First'.
Re^20: Seeking Perl docs about how UTF8 flag propagates (Terminology)
by ikegami (Patriarch) on May 23, 2023 at 01:34 UTC

    You said "And I cringe about calling a byte a character.". What does that even mean? Did someone say a byte is a character? Are you talking about something I said? In which case, what?

    Your explanations, including the one to which you just linked, do not provide clarity.

      ehm ... I literally quoted the portion from the docs.

      please lets stop it here.

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

        Oh, you have a problem with the fact that you can store a byte in a character.

        A character can be:

        1. Smallest addressable unit. Literally a synonym for byte.
        2. Element of the string.
        3. Grapheme
        4. Glyph
        5. Code point

        In Perl, it has the second definition. There are no other words for this.

        You apparently associate character with one of the last three. I don't know which.

        For example, take a look at Å [U+212B], Å [U+C5] and Å [U+41,U+30A].

        • They are all the same glyph, but the first one has a different grapheme.
        • The last two are the same grapheme, but all use different code points.

        So when you say character, do you think that all three of those things are the same? Only two? None of them? I have no idea. Unicode suggests most people would consider that list to have two characters: The Armstrong symbol, and Latin Capital Letter A with Ring Above. But most people isn't everyone. And that's why you should use the more precise term than character if you mean grapheme, glyph or code point. Standards exist for a reason.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11152368]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-18 05:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found