in reply to Re: regular expressions in unicode
in thread regular expressions in unicode

Sorry for the typos.
... I see that, but why? I thought \U was supposed to be an escape sequence to convert the character sequence to uppercase. So I thought \U\w{2,} would only match uppercase characters. \U[A-Za-z0-9_]{2,} does just that; and I thought \w was a shortcut for that character class? (simply put, except for locale settings, or when Unicode is used etc)

Thanks

Replies are listed 'Best First'.
Re^3: regular expressions in unicode
by dave_the_m (Monsignor) on Jan 25, 2007 at 22:08 UTC
    I thought \U was supposed to be an escape sequence to convert the character sequence to uppercase.
    It is, but it applies to the pattern, not to the string being matched. It's most useful when the pattern contains an interpolated string, eg
    $ perl -le '$s = "a"; print "\U$s"' A $ perl -le '$s = "a"; print "matched a" if "a" =~ "\U$s"' $ perl -le '$s = "a"; print "matched A" if "A" =~ "\U$s"' matched A

    Dave.