If you look at the output of -Mre=debug, you'll see that the REx engine first matches $1='aa', then backtracks until $1='a', so that it can then match \1. You can read more about backtracking in perlre.use YAPE::Regex::Explain; die YAPE::Regex::Explain->new( '^(\w+)(\w+)?(?(2)\2\1|\1)$' )->explain; __END__ The regular expression: (?-imsx:^(\w+)(\w+)?(?(2)\2\1|\1)$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \2 (optional (matching the most amount possible)): ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- )? end of \2 (NOTE: because you're using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \2) ---------------------------------------------------------------------- (?(2) if back-reference \2 matched, then: ---------------------------------------------------------------------- \2 what was matched by capture \2 ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- | else: ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- ) end of conditional on \2 ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- E:\>perl -Mre=debug -le"print 66666 if q[aa] =~ /^(\w+)(\w+)?(?(2)\2\1 +|\1)$/" Freeing REx: `,' Compiling REx `^(\w+)(\w+)?(?(2)\2\1|\1)$' size 34 first at 2 synthetic stclass `ANYOF[0-9A-Z_a-z]'. 1: BOL(2) 2: OPEN1(4) 4: PLUS(6) 5: ALNUM(0) 6: CLOSE1(8) 8: CURLYX[1] {0,1}(17) 10: OPEN2(12) 12: PLUS(14) 13: ALNUM(0) 14: CLOSE2(16) 16: WHILEM(0) 17: NOTHING(18) 18: GROUPP2(20) 20: IFTHEN(28) 22: REF2(24) 24: REF1(33) 26: LONGJMP(32) 28: IFTHEN(32) 30: REF1(33) 32: TAIL(33) 33: EOL(34) 34: END(0) floating `'$ at 1..2147483647 (checking floating) stclass `ANYOF[0-9A- +Z_a-z]' anchored(BOL) minlen 1 Guessing start of match, REx `^(\w+)(\w+)?(?(2)\2\1|\1)$' against `aa' +... Found floating substr `'$ at offset 2... Does not contradict STCLASS... Guessed: match at offset 0 Matching REx `^(\w+)(\w+)?(?(2)\2\1|\1)$' against `aa' Setting an EVAL scope, savestack=3 0 <> <aa> | 1: BOL 0 <> <aa> | 2: OPEN1 0 <> <aa> | 4: PLUS ALNUM can match 2 times out of 32767... Setting an EVAL scope, savestack=3 2 <aa> <> | 6: CLOSE1 2 <aa> <> | 8: CURLYX[1] {0,1} 2 <aa> <> | 16: WHILEM 0 out of 0..1 cc=140fb88 Setting an EVAL scope, savestack=8 2 <aa> <> | 10: OPEN2 2 <aa> <> | 12: PLUS ALNUM can match 0 times out of 32767... Setting an EVAL scope, savestack=8 failed... restoring \2..\2 to undef failed, try continuation... 2 <aa> <> | 17: NOTHING 2 <aa> <> | 18: GROUPP2 2 <aa> <> | 20: IFTHEN 2 <aa> <> | 30: REF1 failed... failed... failed... 1 <a> <a> | 6: CLOSE1 1 <a> <a> | 8: CURLYX[1] {0,1} 1 <a> <a> | 16: WHILEM 0 out of 0..1 cc=140fb88 Setting an EVAL scope, savestack=8 1 <a> <a> | 10: OPEN2 1 <a> <a> | 12: PLUS ALNUM can match 1 times out of 32767... Setting an EVAL scope, savestack=8 2 <aa> <> | 14: CLOSE2 2 <aa> <> | 16: WHILEM 1 out of 0..1 cc=140fb88 2 <aa> <> | 17: NOTHING 2 <aa> <> | 18: GROUPP2 2 <aa> <> | 20: IFTHEN 2 <aa> <> | 22: REF2 failed... failed... failed... restoring \2..\2 to undef failed, try continuation... 1 <a> <a> | 17: NOTHING 1 <a> <a> | 18: GROUPP2 1 <a> <a> | 20: IFTHEN 1 <a> <a> | 30: REF1 2 <aa> <> | 33: EOL 2 <aa> <> | 34: END Match successful! 66666 Freeing REx: `^(\w+)(\w+)?(?(2)\2\1|\1)$'
I wouldn't say the 2nd \w+ is useless because the intent seems to be to try to match stuff like "OneTwoTwoOne".
In reply to Re: Conditional regex
by PodMaster
in thread Conditional regex
by chimni
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |