When I run that in a terminal that is using cp1252 (aka "Windows Latin1"), the resulting output is:#!perl use strict; use warnings; use Encode; binmode STDOUT, ":encoding(cp1252)"; my $pattern = qr/\A\w+\z/; my @words = map { decode( "cp1252", $_ ) } qw( Tšekissä Žena Śdipus +Rex ); for my $word (@words) { my $result = $word =~ $pattern ? "matches" : "doesn't match"; printf qq/The word "%s" %s the pattern %s\n/, $word, $result, $pat +tern; }
UPDATE: To clarify, the point here is that when it comes to matching things outside the ASCII range, regex expressions like '\w' will only employ unicode semantics, not cp1252 or any other semantics, so they need to operate on strings that have their perl-internal-utf8 flag set to true (i.e. have been decoded from "external" forms, whether by reading through the appropriate io layer, or by explicit decoding).The word "Tšekissä" matches the pattern (?-xism:\A\w+\z) The word "Žena" matches the pattern (?-xism:\A\w+\z) The word "Śdipus" matches the pattern (?-xism:\A\w+\z) The word "Rex" matches the pattern (?-xism:\A\w+\z)
In reply to Re: Windows-1252 characters from \x{0080} thru \x{009f}
by graff
in thread Windows-1252 characters from \x{0080} thru \x{009f}
by Jim
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |