Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^4: Converting Unicode

by ikegami (Patriarch)
on Dec 03, 2023 at 01:32 UTC ( [id://11156045] : note . print w/replies, xml ) Need Help??


in reply to Re^3: Converting Unicode
in thread Converting Unicode

I didn't say Perl doesn't support Unicode. You're putting words in my mouth. Look closely at what I said.

Yes you did:

"Perl is not yet fully unicode compatible"

Quite the opposite. Perl has the best support for Unicode of any language I've seen. (That includes C, C#, Java, JS and Python.)

Is it true or is it not true that Perl is not using utf8 encodings natively?

It is one of two encodings it uses natively yes. It's an encoding specifically created for Perl after all. Not that has any relevance on the subject.

The very fact that one must tease Perl into using utf8

No, the fact that handles are byte handles by default does not mean that Perl doesn't support Unicode.

the very fact that its developers still see security risks in its use

...means nothing at all. That has nothing to do with Perl at all. There are security issues all programmers that deal with UTF-8, UTF-16 and Unicode must be aware of. Nothing in the document you linked is specific to Perl at all.

For example, allowing https://sаfe.com/ to be displayed is such an issue, since that doesn't say "safe.com". (And that's why it shows up as https://xn--sfe-6cd.com/ in the URL bar of Firefox.)

Replies are listed 'Best First'.
Re^5: Converting Unicode
by Polyglot (Chaplain) on Dec 03, 2023 at 02:15 UTC
    "Perl is not yet fully unicode compatible" is NOT the same as "Perl doesn't support Unicode."

    compatible | kəmˈpadəb(ə)l |
    adjective
    (of two things) able to exist or occur together without conflict: 
    the fruitiness of Beaujolais is compatible with a number of meat dishes.
    • (of two people) able to have a harmonious relationship: well-suited: 
    it's a pity we're not compatible.
    • (of one thing) consistent with another: the symptoms were compatible 
    with gastritis or a peptic ulcer.
    • Computing (of a computer, a piece of software, or other device) able 
    to be used with a specified piece of equipment or software without special 
    adaptation or modification: the printer is fully compatible with all 
    leading software.
    

    Note especially the definition given under the "Computing" sense of meaning: "able to be used with a specified piece of equipment or software without special adaptation or modification."

    There are multiple reasons for saying that "special adaptation or modification" applies here to usage of UTF8 in Perl, not least of which certain incantations must be made to cause Perl to handle it properly--it is NOT the default behavior of Perl to use UTF8!

    Does Perl support Unicode? In a sense yes. It allows Unicode to be used, provided one adapts his or her code in certain ways. This can qualify as "support." It does not, however, qualify as being fully "compatible"--as the code must be specially adapted to use UTF8.

    EDIT: One can easily see, too, that this forum is not especially UTF8-compatible, either. I used the "<pre>" tags for that dictionary definition which is supposed to handle UTF8 better than the "<code>" tags. Well, the resultant rendering of the unicode characters is on display, and I shall leave it thus.

    Blessings,

    ~Polyglot~

      I understand the argument you're making, but I disagree about the word "compatible". I think a more accurate way of saying it is "Perl does not assume a unicode environment", "the unicode support is opt-in", and "getting Perl to treat its environment as unicode requires a lot of tedious steps".

      For contrast, Python 3 does assume a unicode environment, giving people that convenient out-of-the-box support feel, but Python 2 did not, and it caused a great deal of breakage to change that assumption. Perl will probably never change the default, in order to maintain backward compatibility. There are many environments that really still aren't Unicode, and Perl still needs to run in those. There are in fact many more environments Perl can run in than Python, because of that.

      I do wish, though, that there was a simpler option like an environment variable or command line switch that would make Perl assume a unicode environment. That option would probably break a bunch of modules and scripts, and would still need to be opt-in, but people could gradually start supporting it in the same way that we can run perl with Taint checking and see what that breaks. Most importantly though, having it be a single switch rather than dozens of switches all over would make a massive difference for convenience.

        > I do wish, though, that there was a simpler option like ... command line switch that would make Perl assume a unicode environment

        What about the -C-options ? What's missing from your perspective?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

      When something is compatible with a standard, it means it follows the standard.

      When something supports a standard, it means it follows the standard.

      They do indeed mean the same thing. Perhaps you should say what you mean instead of repeatedly insisting these two things don't mean the same thing?


      It does not, however, qualify as being fully "compatible"--as the code must be specially adapted to use UTF8.

      Nonsense.

      My TV is fully compatible with multiple input protocols. But I still have to tell it which one to use.

      I have a device that's fully compatible with both the North American and European power grids, but a switch needs to be placed in the correct position before it's powered.

      To be fully compatible with Unicode does not require handles to provided decoded text by default, and it doesn't require handles to encode text by default. It doesn't require decoding or encoding at all, much less by default.


      Does Perl support Unicode? In a sense yes. It allows Unicode to be used

      Supporting Unicode means a lot more than that.

      A reply falls below the community's threshold of quality. You may see it by logging in.