Re: inconsistency in whitespace handling

Interestingly enough, if you use a named parameter class instead of a the \s assertion, it finds it correctly in both strings.

$ascii = "\xa0\x{a0}";
$unicode = "\x{100}\xa0\x{a0}";

print "Latin-1 \\s : ",  $ascii =~  /\s/ ? "yes":"no","\n";
print "Latin-1 \\p{Space}: ",  $ascii =~  /\p{Space}/ ? "yes":"no","\n
+\n";


print "Unicode \\s: ",$unicode =~ /\s/ ? "yes":"no","\n";
print "Unicode \\p{Space}: ",$unicode =~ /\s/ ? "yes":"no","\n";
[download]

I certainly wouldn't expect non-breaking space to be recognized or not as a space depending on what else was in the string.

(I even experimented with whether the enclosing brackets were significant... apparantly not.)

Comment on Re: inconsistency in whitespace handling Download Code

Replies are listed 'Best First'.
Re^2: inconsistency in whitespace handling by Fletch (Bishop) on May 12, 2005 at 14:45 UTC
I think this is because underneath the `\p{Foo}` stuff generates a different regex opcode which calls through the utf routines even if the source string isn't marked as utf.	[reply] [d/l]
Re^3: inconsistency in whitespace handling by demerphq (Chancellor) on May 13, 2005 at 11:18 UTC
Youd be correct, UTF8 in either the text being matched or the pattern causes UTF8 semantics to apply to the whole regex. A good example of oddness this causes is the differing handling of the german sharp S. If you use extended ascii a case insensitive pattern will not match 'ss' if you use utf8 it will. :-) --- $world=~s/war/peace/g	[reply]