Re^4: Something strange in the world or Regexes

Replies are listed 'Best First'.
Re^5: Something strange in the world or Regexes by ikegami (Patriarch) on Sep 30, 2009 at 18:48 UTC
What you said is very misleading. `$ perl -le' $_ = "\xc2\xa0"; print /^\h$/ ? "h" : "not h"; ' not h` [download] Of course, you are referring to the internal encoding. `$ perl -le' $_ = "\xA0"; utf8::downgrade $_; print /^\s$/ ? "s" : "not s"; print /^\h$/ ? "h" : "not h"; utf8::upgrade $_; print /^\s$/ ? "s" : "not s"; print /^\h$/ ? "h" : "not h"; ' not s h s h` [download] Unfortunately, that's irrelevant in the OP's case since he needs to decode his UTF-8 first, and will make the internal encoding UTF-8.	[reply] [d/l] [select]
Re^5: Something strange in the world or Regexes by jakobi (Pilgrim) on Sep 30, 2009 at 11:47 UTC
Thanx for this nugget. Leading to the confusing situation that \s (all whitespace) matches less than `[\h\v]` (both horizontal and vertical WS) :).	[reply] [d/l]
Re^6: Something strange in the world or Regexes by JavaFan (Canon) on Sep 30, 2009 at 14:08 UTC
Yes, but at least this way \s is somewhat "fixed" without breaking code. "NEXT LINE" ("\x85") is matched by \s only in UTF-8 matching, but always by \v. And perhaps more importantly, a vertical tab (aka LINE TABULATION or "\x0b") is never matched by \s, but always by \v.	[reply]