A note before starting: IIRC, use open doesn't work too well with ARGV (<> is short for <ARGV>), but it works here since you end up reading from STDIN (not a file named on the command line).
I'm having problems getting any locale except en_CA.utf8 working, so I'll use that one. First, I changed your program somewhat:
use open qw/:locale/; BEGIN { require locale; locale->import() if shift; } use Data::Dumper qw( Dumper ); sub dump_str { my ($s) = @_; my $internal_enc = utf8::is_utf8($s) ? "utf8" : "iso-latin-1"; local $Data::Dumper::Useqq = 1; local $Data::Dumper::Terse = 1; local $Data::Dumper::Indent = 0; print("Input = ", Dumper($s), " [$internal_enc]\n"); } print(join(':', PerlIO::get_layers(STDIN)), "\n"); my $s = <>; dump_str($s); print "Outside char class: ", $s =~ m/\w/ ? "" : "no ", "match\n"; print "Inside char class: ", $s =~ m/[\w]/ ? "" : "no ", "match\n";
$ perl -e'binmode STDOUT, ":encoding(UTF-8)"; print chr 0xC9' | LANG=e +n_CA.utf8 perl a.pl 0 unix:perlio:utf8 Input = "\x{c9}" [utf8] Outside char class: match Inside char class: match $ perl -e'binmode STDOUT, ":encoding(UTF-8)"; print chr 0xC9' | LANG=e +n_CA.utf8 perl a.pl 1 unix:perlio:utf8 Input = "\x{c9}" [utf8] Outside char class: no match Inside char class: match
(5.8.8 on Debian)
The important addition is the display of the internal coding of the input. Match operations base some of their behaviour on the internal encoding of the string being matched.
In reply to Re: use locale behavior depends on charset of locale?
by ikegami
in thread use locale behavior depends on charset of locale?
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |