I have a question regarding a very strange behavior which implies the bytes pragma and two UTF-8 strings.
Bellow is the code which illustrates the problem:
use utf8; use 5.010; use strict; use warnings; binmode(STDOUT, ':encoding(UTF-8)'); sub get_bytes { my ($string) = @_; use bytes; map { bytes::ord bytes::substr($string, $_, 1) } 0 .. bytes::lengt +h($string) - 1; } my $s1 = "møøse"; my $s2 = "m\xF8\xF8se"; say $s1; say $s2; say "Equal: ", $s1 eq $s2; say join(" ", get_bytes($s1)); say join(" ", get_bytes($s2));
The output (with perl-5.22.0):
møøse møøse Equal: 1 109 195 184 195 184 115 101 109 248 248 115 101
As shown, the strings are equivalent in UTF-8, but they become different once they are converted into bytes.
Is this a bug or do I miss something important in this conversion? Thanks!
Update: by replacing "m\xF8\xF8se" with decode_utf8(encode_utf8("m\xF8\xF8se")) it seems to work as expected.
In reply to UTF-8 strings and the bytes pragma by trizen
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |