The output on a linux box with locale de_DE.UTF-8 and perl source code encoded in UTF-8 is:#!/usr/bin/perl -CO use strict; use warnings; use Encode; use utf8; my $a = 'ä'; print "UTF8-Flag: ", utf8::is_utf8($a) ? "Yes" : "No"; print " matches word: ", $a =~ /\w/ ? "Yes\n" : "No\n"; my $b = encode("ISO-8859-1", $a); print "UTF8-Flag: ", utf8::is_utf8($b) ? "Yes" : "No"; print " matches word: ", $b =~ /\w/ ? "Yes\n" : "No\n"; use locale; $a = 'ä'; print "UTF8-Flag: ", utf8::is_utf8($a) ? "Yes" : "No"; print " matches word: ", $a =~ /\w/ ? "Yes\n" : "No\n"; $b = encode("ISO-8859-1", $a); print "UTF8-Flag: ", utf8::is_utf8($b) ? "Yes" : "No"; print " matches word: ", $b =~ /\w/ ? "Yes" : "No"; print "\n";
It's the very first case I can't explain to me. Why is an unicode-flagged 'ä' matched against words when locale is not set explicitly?UTF8-Flag: Yes matches word: Yes UTF8-Flag: No matches word: No UTF8-Flag: Yes matches word: No UTF8-Flag: No matches word: No
Thanks in advance
Andreas
In reply to Regex Matching Unicode and Regex Classes by McA
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |