in reply to Re^5: Simplest Possible Way To Disable Unicode
in thread Simplest Possible Way To Disable Unicode

Perl should never treat a number as a 'wide character' without explicit notification from the programmer that that is his intent.

Judging by your example, I think you mean you don't want wide character to automatically get encoded to UTF-8. (Correct me if I'm wrong.)

What do you propose instead? I can think of a couple.

The term 'character' has no meaning outside of some mapping.

Characters have no meaning outside a mapping, but the term does. It's simply the basic unit of a string.

And even when it can be so mapped, until it is mapped, it is still just a number.

I fully agree. That's why I said pack doesn't deal with Unicode. It just deals with numbers. So do chr, ord, substr, index, etc.

Operators that do use mappings are lc, \d in regex patterns, etc.

And 4294967296, much less 18446744073709551616 cannot be mapped to 'a character' in any known or proposed mapping.

No, but 4294967295 is a valid character.

>perl -E"say ord chr 4294967295" 4294967295

Perl uses utf8 (not to be confused with UTF-8), an encoding whose charset consist of 2**72 characters. Only up to UVMAX is supported, though.

Unicode support in Perl is broken.

I'm not going to discuss this because this thread has nothing to do with Unicode.

The OP tried to send non-bytes to a file handle, and you tried to store something larger than a byte in a byte. A warning and dying aren't unwarranted.

Replies are listed 'Best First'.
Re^7: Simplest Possible Way To Disable Unicode
by BrowserUk (Patriarch) on May 24, 2011 at 08:39 UTC
    Perl uses utf8 (not to be confused with UTF-8),

    It is so sad to see apparently intelligent men make such stupid statements.

    Of course 'utf8' will be confused with 'UTF-8'. Search for the former using any search engine or on any reference site, and all you will find are references to the latter.

    an encoding whose charset consist of 2**72 characters.

    Wrong! At best this mythical 'utf8' stores 2**72 ordinal values that could be mapped to a charset.

    But as no such charset exists; nor any that contain even 0.000000000000003% of that stupidly huge number, makes the entire thing totally fallacious.

    Numbers are not characters. They are just numbers.

    Those numbers can be code points that can be mapped to characters. But you cannot map a number that is greater than the number of characters that exist.

    And a 'character' has a very clearly defined meaning. Even in the standard you keep (mis)quoting: Unicode, in intent, encodes the underlying characters—graphemes and grapheme-like units—rather than the variant glyphs (renderings) for such characters.

    The fact that you think you know better says it all for me so I'm done. If you're after another 37 levels, you're on your own.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.