"... and I am unsure what use open ":std", ":encoding(UTF-8)"; is doing that isn't already there from use utf8; and ..." [my emphasis]

The utf8 pragma tells Perl that your source code contains Unicode characters. It has nothing to do with input or output. See the emboldened text near the start of the DESCRIPTION in that documentation:

"Do not use this pragma for anything else than telling Perl that your script is written in UTF-8."

In my code example which you referenced, there is the statement "say q{🐶 = }, "🐶";". That statement contains Unicode characters and therefore I need "use utf8;". The rest of that code only contains 7-bit ASCII characters. If I were to remove "say q{🐶 = }, "🐶";", I wouldn't need "use utf8;".

$ perl -E '
    use strict;
    use warnings;
    use open OUT => qw{:encoding(UTF-8) :std};
    say q{\x{1f436} = }, "\x{1f436}";
    say q{\x{1F436} = }, "\x{1F436}";
    say q{\N{DOG FACE} = }, "\N{DOG FACE}";
'
\x{1f436} = 🐶
\x{1F436} = 🐶
\N{DOG FACE} = 🐶

I don't know if it's difficult to get your head around (and I certainly don't mean to be patronising or condescending) but "\N{DOG FACE}" is part of the source code which contains twelve 7-bit ASCII characters: you do not need the utf8 pragma for this.

The "\N{DOG FACE}" resolves to a Unicode character in the output and the open pragma handles that. Here's what happens if I omit the open pragma:

$ perl -E '
    use strict;
    use warnings;
    say q{\x{1f436} = }, "\x{1f436}";
    say q{\x{1F436} = }, "\x{1F436}";
    say q{\N{DOG FACE} = }, "\N{DOG FACE}";
'
Wide character in say at -e line 4.
\x{1f436} = 🐶
Wide character in say at -e line 5.
\x{1F436} = 🐶
Wide character in say at -e line 6.
\N{DOG FACE} = 🐶

And note that all of those "Wide character" warnings remain if I use the utf8 pragma:

$ perl -E '
    use strict;
    use warnings;
    use utf8;
    say q{\x{1f436} = }, "\x{1f436}";
    say q{\x{1F436} = }, "\x{1F436}";
    say q{\N{DOG FACE} = }, "\N{DOG FACE}";
    say q{🐶 = }, "🐶";
'
Wide character in say at -e line 5.
\x{1f436} = 🐶
Wide character in say at -e line 6.
\x{1F436} = 🐶
Wide character in say at -e line 7.
\N{DOG FACE} = 🐶
Wide character in say at -e line 8.
Wide character in say at -e line 8.
🐶 = 🐶

As further examples, see "uparse - Parse Unicode strings", and its improved version "Re: Decoding @ARGV [Was: uparse - Parse Unicode strings]", both of which read and write Unicode characters and use the open pragma, but neither contains Unicode characters in the source code so neither uses the utf8 pragma.

[Aside: I'm going away for Christmas and won't be touching any computer equipment. Christmas Day is less than two hours away in my time zone. If you have further questions regarding this, I won't get to them for a few days; although, someone else might.]

— Ken


In reply to Re^5: Another Unicode/emoji question by kcott
in thread Another Unicode/emoji question by Bod

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.