in reply to Understanding regex

The character class in your second example doesn't include ':' among the permitted characters, and this stops it being able to match the original duplicated substring. As a result it walks through the string looking for a duplicate that it can match, and 'd: d' is the first one it finds.

To match the initial duplicate with the character class, add the colon to the character class:

s/([:()\w\s-]+): \1/$1/;

Hugo

Replies are listed 'Best First'.
Re^2: Understanding regex
by Hena (Friar) on Apr 29, 2005 at 11:44 UTC
    You know, some times one just feels like an idiot :D. Thanks.

      Install YAPE::Regex::Explain if you don't already have it, as it can break out a regex into a more verbose format that can help you spot things like that.

      $ perl -MYAPE::Regex::Explain -le 'print YAPE::Regex::Explain->new(qr/ +([()\w\s-]+): \1/ )->explain' The regular expression: (?-imsx:([()\w\s-]+): \1) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [()\w\s-]+ any character of: '(', ')', word characters (a-z, A-Z, 0-9, _), whitespace (\n, \r, \t, \f, and " "), '- ' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- : ': ' ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------