Re^6: Default encoding rules leave me puzzled...

Replies are listed 'Best First'.
Re^7: Default encoding rules leave me puzzled... by Jim (Curate) on Jun 22, 2014 at 19:05 UTC
What output does that Perl command-line script produce? `C:\>chcp Active code page: 437 C:\>perl -MScalar::Util=looks_like_number -wE "use utf8; say looks_lik +e_number('į')? 'yes' : 'no'" no C:\>bash $ perl -MScalar::Util=looks_like_number -wE 'use utf8; say looks_like_ +number("į") ? "yes" : "no"' Malformed UTF-8 character (1 byte, need 3, after start byte 0xe7) at - +e line 1. no $ exit C:\>` [download] By posting a command-line script and then not posting the output it produces, you've made no useful point—at least not one that's immediately understandable.	[reply] [d/l]
Re^8: Default encoding rules leave me puzzled... by ikegami (Patriarch) on Jun 25, 2014 at 17:31 UTC
If you saw `į` in a console set to cp437, you didn't actually have `į` in the script because the code is treated as being UTF-8. Other than properly encoding the `į`, you can address that issue by replacing `į` with `chr(0xE7)`. It will still output `no`.	[reply] [d/l] [select]
Re^8: Default encoding rules leave me puzzled... by ikegami (Patriarch) on Jun 25, 2014 at 17:32 UTC
If you saw `į` in a console set to cp437, you didn't actually have `į` in the script because the code is treated as being UTF-8. Other than properly encoding the `į`, you can address that issue by replacing `į` with `chr(0xE7)`. It will still output `no`.	[reply] [d/l] [select]
Re^7: Default encoding rules leave me puzzled... by Anonymous Monk on Jun 21, 2014 at 13:37 UTC
OMG so that's probably what Perl actually does. Converts in-place the internal representation of the string from UTF-X to the equivalent octet sequence in the native encoding (Latin-1 or EBCDIC). Yeah, binary print works exactly like utf8::downgrade.	[reply]
Re^8: Default encoding rules leave me puzzled... by Anonymous Monk on Jun 21, 2014 at 13:48 UTC
Except it doesn't convert in place... `perl -MEncode=encode -wE 'use utf8; my $c = q(Franįais); say $c; say e +ncode("utf-8", $c)'` [download] It encodes the string to Latin-1. Or EBCDIC. Case closed.	[reply] [d/l]
Re^9: Default encoding rules leave me puzzled... by Jim (Curate) on Jun 22, 2014 at 19:08 UTC
Huh? $ chcp Active code page: 437 $ perl -MEncode=encode -wE 'use utf8; my $c = q(Franįais); say $c; say encode("utf-8", $c)' Wide character in say at -e line 1. Franτais Fran∩┐╜ais $	[reply]