in reply to UTF-8 strings and the bytes pragma

For strings Perl uses two encodings internally, one is like UTF-8 (with some differences), the other one isn't. That's supposed to be completely transparent for the programmer. use utf8 just tells the compiler that the source code is in UTF-8. It doesn't say anything about how the strings are going to be stored internally.

Replies are listed 'Best First'.
Re^2: UTF-8 strings and the bytes pragma
by trizen (Hermit) on Jun 19, 2015 at 17:37 UTC

    In my opinion, this may lead to some inconsistencies.

    For example, when:

    my $s1 = "𝔘𝔫𝔦𝔠𝔬𝔡𝔢";
    my $s2 = "\x{1D518}\x{1D52B}\x{1D526}\x{1D520}\x{1D52C}\x{1D521}\x{1D522}";
    

    the bytes are the same:

    240 157 148 152 240 157 148 171 240 157 148 166 240 157 148 160 240 157 148 172 240 157 148 161 240 157 148 162
    240 157 148 152 240 157 148 171 240 157 148 166 240 157 148 160 240 157 148 172 240 157 148 161 240 157 148 162
    

    I think it would be nice to have an way that automatically converts literal strings with hex escapes like "\x{...}" into UTF-8 strings.

      In my opinion, this may lead to some inconsistencies.
      Here's a relatively recent discussion about it on p5p mailing list: http://www.nntp.perl.org/group/perl.perl5.porters/2015/01/msg224867.html