Re: Converting Unicode

Perl is not yet fully unicode compatible, despite the fact we will soon ring in the year 2024. Perl's official documents still see security risks with unicode, saying, for example: "Also, the use of Unicode may present security issues that aren't obvious, see 'Security Implications of Unicode' below." There are, however, some ways to get around this. One of those is to include pleas in your own code to use unicode, such as these:

use utf8;  #FOR THE "wide characters" IN YOUR OWN CODE

binmode STDIN,  ":utf8"; #FOR INCOMING UTF8
binmode STDOUT, ":utf8"; #FOR OUTGOING UTF8
binmode STDERR, ":utf8"; #AND FOR ERRORS SEPARATELY

use open qw/:std :utf8/;        #THIS ONE CAN BE PROBLEMATIC WITH DATA
+BASE INTERACTIONS
use open ':encoding(utf8)';     #ANOTHER WAY OF SAYING IT

use feature 'unicode_strings';  #ANOTHER PART OF 'TMTOWTDI' FOR PERL U
+NICODE
[download]

When it's someone else's code, however, the situation becomes more problematic. Be careful which modules you choose to incorporate.

Of course, if these options fail, and the UTF8 characters are not quintessential to your application, you can also remove them all and stick with a pure-ASCII solution. This may cause the least headache if UTF8 is not important to you. You could then use virtually any modules, and have no issue with any I/O operations. But it will not be very future-proof.

I look forward to the day when Perl has advanced to using unicode natively--by default. It's too bad that day is not already here.

See more here: https://perldoc.perl.org/perlunicode

Blessings,

~Polyglot~

Comment on Re: Converting Unicode Download Code

Replies are listed 'Best First'.
Re^2: Converting Unicode by ikegami (Patriarch) on Dec 02, 2023 at 19:26 UTC
Pretty odd that you mention Perl doesn't support Unicode while showing it does, and pretty odd that you mention security risks then proceed to use `:utf8` (whose non-validating nature can produce corrupt scalars) instead of `:encoding(UTF-8)`.	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re^2: Converting Unicode by Polyglot (Chaplain) on Dec 18, 2023 at 19:36 UTC
I've been reading the documentation for Perl 6, aka "Raku"...and I think I'm falling in love again. Unlike Perl 5, Raku is UTF8-based, both in its code, and its I/O. In their words...from the "Lexical Conventions" entry HERE: Raku code is Unicode text. Current implementations support UTF-8 as the input encoding. See also Unicode versus ASCII symbols. And from the "Normalization" entry HERE: Raku applies normalization by default to all input and output except for file names, which are read and written as UTF8-C8; graphemes, which are user-visible forms of the characters, will use a normalized representation. Everything I've been reading, fits what I've been needing. Perhaps it's time for a new language. I'm on the verge of taking that plunge. The UTF8 issue has been troublesome for me with Perl5 for a long time, and is the proverbial straw that broke the camel's back--perhaps the pun is fitting. Blessings, ~Polyglot~	[reply]
Re^3: Converting Unicode by 1nickt (Canon) on Dec 19, 2023 at 10:21 UTC
FYI Raku is `formerly` known as Perl 6, not `alternatively`. Raku is `not` Perl, despite being described in some places as a "sister language." The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^4: Converting Unicode by Polyglot (Chaplain) on Dec 19, 2023 at 14:24 UTC
It isn't so cut and dried. Yes, it was renamed. But it still retains many references in the documentation to Perl 6. For example: https://docs.raku.org/language/5to6-nutshell Also, they have still maintained a considerable amount of compatibility with Perl5 modules, coding/syntax, etc.--much of which is enabled optionally, or via a module specially created for the purpose. Blessings, ~Polyglot~	[reply]
Re^5: Converting Unicode by hippo (Archbishop) on Dec 19, 2023 at 14:31 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.