As shown, the strings are equivalent in UTF-8, but they become different once they are converted into bytes

The strings are the same, but they are stored differently.

Just like the number one can be stored as

IV = "\x00\x00\x00\x01" [+1]
or
NV = "\x00\x00\x00\x00\x00\x00\xF0\x3F" [+1.0 * 2**(1023-1023)]

The string møøse can be stored as

PV = "\x6D\xF8\xF8\x73\x65\x00" [møøse]
or
PV,UTF8 = "\x6D\xC3\xB8\xC3\xB8\x73\x65\x00" [møøse]

Is this a bug

The bug is the use of the bytes module. Never use the bytes module.

Any module that uses get_bytes or equivalent is buggy. If you want to convert the internal storage format of a scalar because are dealing with a buggy module, you can use builtins utf8::upgrade and utf8::downgrade.

# Converts $s to use the UTF8=1 storage format if it's not already. utf8::upgrade($s); # Converts $s to use the UTF8=0 storage format if it's not already. # Dies if it can't. utf8::downgrade($s); # Converts $s to use the UTF8=0 storage format if it's not already. # Returns false if it can't. utf8::downgrade($s, 1);

But you hopefully never have to do that. If instead you are trying to convert a Unicode string to UTF-8 or vice-versa, you have a few options.

# From Unicode to UTF-8: utf8::encode($s); utf8::encode(my $utf8 = $uni); use Encode qw( encode ); my $utf8 = encode('UTF-8', $uni); use Encode qw( encode_utf8 ); my $utf8 = encode_utf8($uni); # From UTF-8 to Unicode: utf8::decode($s); utf8::decode(my $uni = $utf8); use Encode qw( decode ); my $uni = decode('UTF-8', $utf8); use Encode qw( decode_utf8 ); my $uni = decode_utf8($utf8);

The utf8:: functions are built-in and work in-place. The Encode:: functions have more flexible error handling.


In reply to Re: UTF-8 strings and the bytes pragma by ikegami
in thread UTF-8 strings and the bytes pragma by trizen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.