use feature 'unicode_strings'; is an excellent point!
Since neither "use feature 'unicode_strings';", nor e.g. "use 5.016;" was declared, then lc does exactly as described above.
If by "described above" you mean "Only the characters A-Z change, to a-z respectively.", then I think your reading of the lc docs might be a little off, my understanding is that bytes is not the default behavior. The following test took a little fiddling to get the right values but it passes on all Perl releases starting with 5.8.1, 5.8.9, 5.10.1, up to 5.26 and shows the differences:
use warnings;
use strict;
use utf8;
use Test::More;
diag explain "Perl $]";
plan tests => $] ge '5.012' ? 15 : 11;
SKIP: {
is "\N{U+00C9}", "É", '\N{U+...} escape';
skip 'Perl ge 5.12 required', 1 unless $] ge '5.012';
ok utf8::is_utf8("\N{U+00C9}"), '\N{U+...} sets UTF8';
}
{
ok !utf8::is_utf8("\x{C9}"), '\x doesn\'t set UTF8';
is lc("\x{C9}"), "\xC9", 'lc on non-UTF8 str';
ok utf8::is_utf8("É"), 'str is UTF8';
is lc("É"), "é", 'lc on UTF8 str';
}
{
use bytes;
ok !utf8::is_utf8("\x{C9}"), 'bytes: \x doesn\'t set UTF8';
is lc("\x{C9}"), "\xC9", 'bytes: lc on non-UTF8 str';
ok utf8::is_utf8("É"), 'bytes: str is UTF8';
is lc("É"), $] lt '5.008009' ? "\xC9" : "\xC3\x89",
'bytes: lc on UTF8 str';
}
SKIP: { skip 'Perl ge 5.12 required', 1 unless $] ge '5.012';
ok eval q{ do {
use feature 'unicode_strings';
ok !utf8::is_utf8("\x{C9}"), 'u_s: \x doesn\'t set UTF8';
is lc("\x{C9}"), "é", 'u_s: lc on non-UTF8 str';
ok utf8::is_utf8("É"), 'u_s: str is UTF8';
is lc("É"), "é", 'u_s: lc on UTF8 str';
1 } }, 'unicode_strings works' or warn $@ }
| [reply] [d/l] [select] |
You are right, I was too quick to paste a quote from lc documentation page, it should have been the last, fall-through case.
Also, kurisuto, my comment was not a solution, rather an attempt at explanation (to myself) of what was happening -- too rarely I deal with extended-ASCII, and yet not-utf8 strings. Proper fix (at least, for anything but one-time scripts) would be to always explicitly decode inputs from all sources, not hoping them to be Latin-1 only, and Perl silently doing "the right thing" in background.
| [reply] |
Aha! Adding "use feature 'unicode_strings';" fixed the problem.
Thank you!
| [reply] |