in reply to Re^5: Converting Unicode
in thread Converting Unicode

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^7: Converting Unicode
by jeffenstein (Hermit) on Dec 20, 2023 at 09:57 UTC

    Seriously, just add export PERL_UNICODE=SDAL to your environment, and use utf8; to the top of the scripts that you write, and it will do exactly what you keep asking for in your posts. If you're willing to do the research, you can read 'perldoc perlrun' and 'perldoc utf8' to see why.

Re^7: Converting Unicode
by choroba (Cardinal) on Dec 19, 2023 at 18:34 UTC
    As a linguist, I've worked with various languages including Arabic, Chinese, or Tamil. We processed corpora in those languages in Perl, we even built an treebank annotation tool in Tk. We never had problems with Unicode. 🤷🏽

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      If you never had problems, it's because you were experienced enough to stay on top of things and not merely allow Perl to just do its thing.
      Perl uses UTF-8 only when it thinks it is beneficial, so if all the characters in your string are in the range 0..255, there's a good chance the characters are all packed in bytes--but in the absence of other knowledge, you can't be sure because Perl converts between fixed 8-bit characters and variable-length UTF-8 characters as necessary. (Programming Perl, p. 403)
      The "as necessary" is not necessarily as you might wish, as those less experienced quickly learn the hard way. Even the experienced, facing more complex requirements (just working with Chinese is not necessarily complex--it depends on the workflow and the forms of I/O required), often find hidden "gotchas," such as with locales, filenames, databases, incorporating other Perl modules, etc.

      Blessings,

      ~Polyglot~

Re^7: Converting Unicode
by Jenda (Abbot) on Dec 20, 2023 at 23:16 UTC
    Up to now, I've been forced to maintain a strict ordering of parameters, and many of my subroutines have so many parameters as to make such code a nightmare to maintain or amend.

    Maybe ... maybe ... just maybe ... you should learn to design your code better. Subroutines with too many parameters are a definitely code smell. If your subroutines have so many parameters that getting their order right, you are using subroutines wrong.

    Jenda
    1984 was supposed to be a warning,
    not a manual!

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re^7: Converting Unicode
by kcott (Archbishop) on Dec 19, 2023 at 18:45 UTC
    "Even the Perl manual which I have in hardcopy form says this: ... (Programming Perl, p. 402)"

    Programming Perl is not the Perl manual! It is a book describing Perl4.

    Subsequent editions of that book described early versions of Perl5: Ed2=5.003; Ed3=5.6; Ed4=5.16 — none of which were hardcopies of the Perl manual.

    You seem to posting a lot of FUD; especially in relation to Unicode. Please stop doing that.

    — Ken

      I have the third edition. As it says on its back cover:
      What's new in this edition? Practically everything. This third edition of Programming Perl has not only been expanded to cover the new 5.6 release of Perl, it also has been completely reorganized and fortified with numerous examples. Most existing topics have been dramatically reworked from the ground up, like object-oriented programming and regular expressions, and many brand new chapters have been added, including those on profiling, pod, Unicode, threading, compiling, and Perl internals.

      Part bible, part encyclopedia, and part almanac, this is the essential book on Perl.

      Note: "Unicode" was a brand-new chapter in this edition.

      Blessings,

      ~Polyglot~

        5.6? 23 years ago!

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        A reply falls below the community's threshold of quality. You may see it by logging in.