in reply to Re^5: Default encoding rules leave me puzzled...
in thread Default encoding rules leave me puzzled...

I remembered something.
perl -MScalar::Util=looks_like_number -wE 'use utf8; say looks_like_nu +mber("ç")? "yes" : "no"'

Replies are listed 'Best First'.
Re^7: Default encoding rules leave me puzzled...
by Jim (Curate) on Jun 22, 2014 at 19:05 UTC

    What output does that Perl command-line script produce?

    C:\>chcp Active code page: 437 C:\>perl -MScalar::Util=looks_like_number -wE "use utf8; say looks_lik +e_number('ç')? 'yes' : 'no'" no C:\>bash $ perl -MScalar::Util=looks_like_number -wE 'use utf8; say looks_like_ +number("ç") ? "yes" : "no"' Malformed UTF-8 character (1 byte, need 3, after start byte 0xe7) at - +e line 1. no $ exit C:\>

    By posting a command-line script and then not posting the output it produces, you've made no useful point—at least not one that's immediately understandable.

      If you saw ç in a console set to cp437, you didn't actually have ç in the script because the code is treated as being UTF-8.

      Other than properly encoding the ç, you can address that issue by replacing ç with chr(0xE7). It will still output no.

      If you saw ç in a console set to cp437, you didn't actually have ç in the script because the code is treated as being UTF-8.

      Other than properly encoding the ç, you can address that issue by replacing ç with chr(0xE7). It will still output no.

Re^7: Default encoding rules leave me puzzled...
by Anonymous Monk on Jun 21, 2014 at 13:37 UTC
    OMG so that's probably what Perl actually does.
    Converts in-place the internal representation of the string from UTF-X to the equivalent octet sequence in the native encoding (Latin-1 or EBCDIC).
    Yeah, binary print works exactly like utf8::downgrade.
      Except it doesn't convert in place...
      perl -MEncode=encode -wE 'use utf8; my $c = q(Français); say $c; say e +ncode("utf-8", $c)'
      It encodes the string to Latin-1. Or EBCDIC. Case closed.

        Huh?

        $ chcp
        Active code page: 437
        $ perl -MEncode=encode -wE 'use utf8; my $c = q(Français); say $c; say encode("utf-8", $c)'
        Wide character in say at -e line 1.
        Franτais
        Fran�ais
        $