in reply to inconsistency in whitespace handling

Interestingly enough, if you use a named parameter class instead of a the \s assertion, it finds it correctly in both strings.

$ascii = "\xa0\x{a0}"; $unicode = "\x{100}\xa0\x{a0}"; print "Latin-1 \\s : ", $ascii =~ /\s/ ? "yes":"no","\n"; print "Latin-1 \\p{Space}: ", $ascii =~ /\p{Space}/ ? "yes":"no","\n +\n"; print "Unicode \\s: ",$unicode =~ /\s/ ? "yes":"no","\n"; print "Unicode \\p{Space}: ",$unicode =~ /\s/ ? "yes":"no","\n";

I certainly wouldn't expect non-breaking space to be recognized or not as a space depending on what else was in the string.

(I even experimented with whether the enclosing brackets were significant... apparantly not.)

Replies are listed 'Best First'.
Re^2: inconsistency in whitespace handling
by Fletch (Bishop) on May 12, 2005 at 14:45 UTC

    I think this is because underneath the \p{Foo} stuff generates a different regex opcode which calls through the utf routines even if the source string isn't marked as utf.

      Youd be correct, UTF8 in either the text being matched or the pattern causes UTF8 semantics to apply to the whole regex. A good example of oddness this causes is the differing handling of the german sharp S. If you use extended ascii a case insensitive pattern will not match 'ss' if you use utf8 it will. :-)

      ---
      $world=~s/war/peace/g