and you get character semantics (not byte semantics) when doing stuff with that string

There's no such thing. If an operation behaves differently depending on the internal encoding of the string, it's a bug. These are being fixed. e.g. pack was fixed in 5.10.0. Regex matches and other are being fixed for 5.12. Text::CSV_XS was fixed in 0.46.

that's the point of using "decode()" and the encoding IO layer

Not at all. The point of decode is to decode characters. It has nothing to do with the internal storage of strings.

You can have decoded characters with the utf8 flag off.
You can encoded characters with the utf8 flag on.

If you need to play with the internal encoding, utf8::upgrade and utf8::downgrade are the appropriate tools.

This is what the previously linked document shows.

the "perl-internal utf8" storage of characters in the rang 0x80-0xFF is single-byte.

Impossible. The high bit indicates the presence of a multiple byte char.

$ perl -MEncode -le'print length encode "utf8", decode "UTF-16le", "\x +FE\x00"' 2
or
$ perl -MDevel::Peek -MEncode -le'Dump decode "UTF-16le", "\xFE\x00"' ... PV = 0x8172e78 "\303\276"\0 [UTF8 "\x{fe}"] CUR = 2 ...

U+000000-U+00007F: One byte
U+000080-U+0007FF: Two bytes
U+000800-U+00FFFF: Three bytes
U+010000-U+10FFFF: Four bytes


In reply to Re^4: Parsing UTF-16LE CSV Records Using Text::CSV* by ikegami
in thread Parsing UTF-16LE CSV Records Using Text::CSV* by Jim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.