in reply to use locale behavior depends on charset of locale?
A note before starting: IIRC, use open doesn't work too well with ARGV (<> is short for <ARGV>), but it works here since you end up reading from STDIN (not a file named on the command line).
I'm having problems getting any locale except en_CA.utf8 working, so I'll use that one. First, I changed your program somewhat:
use open qw/:locale/; BEGIN { require locale; locale->import() if shift; } use Data::Dumper qw( Dumper ); sub dump_str { my ($s) = @_; my $internal_enc = utf8::is_utf8($s) ? "utf8" : "iso-latin-1"; local $Data::Dumper::Useqq = 1; local $Data::Dumper::Terse = 1; local $Data::Dumper::Indent = 0; print("Input = ", Dumper($s), " [$internal_enc]\n"); } print(join(':', PerlIO::get_layers(STDIN)), "\n"); my $s = <>; dump_str($s); print "Outside char class: ", $s =~ m/\w/ ? "" : "no ", "match\n"; print "Inside char class: ", $s =~ m/[\w]/ ? "" : "no ", "match\n";
$ perl -e'binmode STDOUT, ":encoding(UTF-8)"; print chr 0xC9' | LANG=e +n_CA.utf8 perl a.pl 0 unix:perlio:utf8 Input = "\x{c9}" [utf8] Outside char class: match Inside char class: match $ perl -e'binmode STDOUT, ":encoding(UTF-8)"; print chr 0xC9' | LANG=e +n_CA.utf8 perl a.pl 1 unix:perlio:utf8 Input = "\x{c9}" [utf8] Outside char class: no match Inside char class: match
(5.8.8 on Debian)
The important addition is the display of the internal coding of the input. Match operations base some of their behaviour on the internal encoding of the string being matched.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: use locale behavior depends on charset of locale?
by ikegami (Patriarch) on Jul 10, 2009 at 16:02 UTC | |
by ig (Vicar) on Jul 10, 2009 at 17:04 UTC | |
by zwon (Abbot) on Jul 10, 2009 at 18:02 UTC | |
by ikegami (Patriarch) on Jul 10, 2009 at 18:35 UTC |