trizen has asked for the wisdom of the Perl Monks concerning the following question:
I have a question regarding a very strange behavior which implies the bytes pragma and two UTF-8 strings.
Bellow is the code which illustrates the problem:
use utf8; use 5.010; use strict; use warnings; binmode(STDOUT, ':encoding(UTF-8)'); sub get_bytes { my ($string) = @_; use bytes; map { bytes::ord bytes::substr($string, $_, 1) } 0 .. bytes::lengt +h($string) - 1; } my $s1 = "møøse"; my $s2 = "m\xF8\xF8se"; say $s1; say $s2; say "Equal: ", $s1 eq $s2; say join(" ", get_bytes($s1)); say join(" ", get_bytes($s2));
The output (with perl-5.22.0):
møøse møøse Equal: 1 109 195 184 195 184 115 101 109 248 248 115 101
As shown, the strings are equivalent in UTF-8, but they become different once they are converted into bytes.
Is this a bug or do I miss something important in this conversion? Thanks!
Update: by replacing "m\xF8\xF8se" with decode_utf8(encode_utf8("m\xF8\xF8se")) it seems to work as expected.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: UTF-8 strings and the bytes pragma
by choroba (Cardinal) on Jun 19, 2015 at 15:30 UTC | |
Re: UTF-8 strings and the bytes pragma
by ikegami (Patriarch) on Jun 19, 2015 at 18:44 UTC | |
Re: UTF-8 strings and the bytes pragma
by Anonymous Monk on Jun 19, 2015 at 17:14 UTC | |
by trizen (Hermit) on Jun 19, 2015 at 17:37 UTC | |
by Anonymous Monk on Jun 19, 2015 at 17:54 UTC |