in reply to regular expressions in unicode

which I would write as /\b\U/w{2,}\E\b\/

Um just as an aside, that doesn't do what you think it does (even ignoring the typos). It's equivalent to

/\b\W{2,}\b/
Dave.

Replies are listed 'Best First'.
Re^2: regular expressions in unicode
by Anonymous Monk on Jan 25, 2007 at 15:22 UTC
    Sorry for the typos.
    ... I see that, but why? I thought \U was supposed to be an escape sequence to convert the character sequence to uppercase. So I thought \U\w{2,} would only match uppercase characters. \U[A-Za-z0-9_]{2,} does just that; and I thought \w was a shortcut for that character class? (simply put, except for locale settings, or when Unicode is used etc)

    Thanks
      I thought \U was supposed to be an escape sequence to convert the character sequence to uppercase.
      It is, but it applies to the pattern, not to the string being matched. It's most useful when the pattern contains an interpolated string, eg
      $ perl -le '$s = "a"; print "\U$s"' A $ perl -le '$s = "a"; print "matched a" if "a" =~ "\U$s"' $ perl -le '$s = "a"; print "matched A" if "A" =~ "\U$s"' matched A

      Dave.