"... and I am unsure what use open ":std", ":encoding(UTF-8)"; is doing that isn't already there from use utf8; and ..." [my emphasis]
The utf8 pragma tells Perl that your source code contains Unicode characters.
It has nothing to do with input or output.
See the emboldened text near the start of the DESCRIPTION in that documentation:
"Do not use this pragma for anything else than telling Perl that your script is written in UTF-8."
In my code example which you referenced, there is the statement "say q{🐶 = }, "🐶";".
That statement contains Unicode characters and therefore I need "use utf8;".
The rest of that code only contains 7-bit ASCII characters.
If I were to remove "say q{🐶 = }, "🐶";", I wouldn't need "use utf8;".
$ perl -E '
use strict;
use warnings;
use open OUT => qw{:encoding(UTF-8) :std};
say q{\x{1f436} = }, "\x{1f436}";
say q{\x{1F436} = }, "\x{1F436}";
say q{\N{DOG FACE} = }, "\N{DOG FACE}";
'
\x{1f436} = 🐶
\x{1F436} = 🐶
\N{DOG FACE} = 🐶
I don't know if it's difficult to get your head around (and I certainly don't mean to be patronising or condescending)
but "\N{DOG FACE}" is part of the source code which contains twelve 7-bit ASCII characters:
you do not need the utf8 pragma for this.
The "\N{DOG FACE}" resolves to a Unicode character in the output
and the open pragma handles that.
Here's what happens if I omit the open pragma:
$ perl -E '
use strict;
use warnings;
say q{\x{1f436} = }, "\x{1f436}";
say q{\x{1F436} = }, "\x{1F436}";
say q{\N{DOG FACE} = }, "\N{DOG FACE}";
'
Wide character in say at -e line 4.
\x{1f436} = 🐶
Wide character in say at -e line 5.
\x{1F436} = 🐶
Wide character in say at -e line 6.
\N{DOG FACE} = 🐶
And note that all of those "Wide character" warnings remain if I use the utf8 pragma:
$ perl -E '
use strict;
use warnings;
use utf8;
say q{\x{1f436} = }, "\x{1f436}";
say q{\x{1F436} = }, "\x{1F436}";
say q{\N{DOG FACE} = }, "\N{DOG FACE}";
say q{🐶 = }, "🐶";
'
Wide character in say at -e line 5.
\x{1f436} = 🐶
Wide character in say at -e line 6.
\x{1F436} = 🐶
Wide character in say at -e line 7.
\N{DOG FACE} = 🐶
Wide character in say at -e line 8.
Wide character in say at -e line 8.
🐶 = 🐶
As further examples, see "uparse - Parse Unicode strings", and its improved version "Re: Decoding @ARGV [Was: uparse - Parse Unicode strings]",
both of which read and write Unicode characters and use the open pragma,
but neither contains Unicode characters in the source code so neither uses the utf8 pragma.
[Aside:
I'm going away for Christmas and won't be touching any computer equipment.
Christmas Day is less than two hours away in my time zone.
If you have further questions regarding this, I won't get to them for a few days;
although, someone else might.]
|