Re: regular expressions in unicode

which I would write as /\b\U/w{2,}\E\b\/

Um just as an aside, that doesn't do what you think it does (even ignoring the typos). It's equivalent to

/\b\W{2,}\b/
[download]

Dave.

Comment on Re: regular expressions in unicode Select or Download Code

Replies are listed 'Best First'.
Re^2: regular expressions in unicode by Anonymous Monk on Jan 25, 2007 at 15:22 UTC
Sorry for the typos. ... I see that, but why? I thought `\U` was supposed to be an escape sequence to convert the character sequence to uppercase. So I thought `\U\w{2,}` would only match uppercase characters. `\U[A-Za-z0-9_]{2,}` does just that; and I thought `\w` was a shortcut for that character class? (simply put, except for locale settings, or when Unicode is used etc) Thanks	[reply] [d/l] [select]
Re^3: regular expressions in unicode by dave_the_m (Monsignor) on Jan 25, 2007 at 22:08 UTC
I thought \U was supposed to be an escape sequence to convert the character sequence to uppercase. It is, but it applies to the pattern, not to the string being matched. It's most useful when the pattern contains an interpolated string, eg `$ perl -le '$s = "a"; print "\U$s"' A $ perl -le '$s = "a"; print "matched a" if "a" =~ "\U$s"' $ perl -le '$s = "a"; print "matched A" if "A" =~ "\U$s"' matched A` [download] Dave.	[reply] [d/l]