use YAPE::Regex::Explain;
die YAPE::Regex::Explain->new(
'^(\w+)(\w+)?(?(2)\2\1|\1)$'
)->explain;
__END__
The regular expression:
(?-imsx:^(\w+)(\w+)?(?(2)\2\1|\1)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2 (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
)? end of \2 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \2)
----------------------------------------------------------------------
(?(2) if back-reference \2 matched, then:
----------------------------------------------------------------------
\2 what was matched by capture \2
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
| else:
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of conditional on \2
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
E:\>perl -Mre=debug -le"print 66666 if q[aa] =~ /^(\w+)(\w+)?(?(2)\2\1
+|\1)$/"
Freeing REx: `,'
Compiling REx `^(\w+)(\w+)?(?(2)\2\1|\1)$'
size 34 first at 2
synthetic stclass `ANYOF[0-9A-Z_a-z]'.
1: BOL(2)
2: OPEN1(4)
4: PLUS(6)
5: ALNUM(0)
6: CLOSE1(8)
8: CURLYX[1] {0,1}(17)
10: OPEN2(12)
12: PLUS(14)
13: ALNUM(0)
14: CLOSE2(16)
16: WHILEM(0)
17: NOTHING(18)
18: GROUPP2(20)
20: IFTHEN(28)
22: REF2(24)
24: REF1(33)
26: LONGJMP(32)
28: IFTHEN(32)
30: REF1(33)
32: TAIL(33)
33: EOL(34)
34: END(0)
floating `'$ at 1..2147483647 (checking floating) stclass `ANYOF[0-9A-
+Z_a-z]' anchored(BOL) minlen 1
Guessing start of match, REx `^(\w+)(\w+)?(?(2)\2\1|\1)$' against `aa'
+...
Found floating substr `'$ at offset 2...
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx `^(\w+)(\w+)?(?(2)\2\1|\1)$' against `aa'
Setting an EVAL scope, savestack=3
0 <> <aa> | 1: BOL
0 <> <aa> | 2: OPEN1
0 <> <aa> | 4: PLUS
ALNUM can match 2 times out of 32767...
Setting an EVAL scope, savestack=3
2 <aa> <> | 6: CLOSE1
2 <aa> <> | 8: CURLYX[1] {0,1}
2 <aa> <> | 16: WHILEM
0 out of 0..1 cc=140fb88
Setting an EVAL scope, savestack=8
2 <aa> <> | 10: OPEN2
2 <aa> <> | 12: PLUS
ALNUM can match 0 times out of 32767...
Setting an EVAL scope, savestack=8
failed...
restoring \2..\2 to undef
failed, try continuation...
2 <aa> <> | 17: NOTHING
2 <aa> <> | 18: GROUPP2
2 <aa> <> | 20: IFTHEN
2 <aa> <> | 30: REF1
failed...
failed...
failed...
1 <a> <a> | 6: CLOSE1
1 <a> <a> | 8: CURLYX[1] {0,1}
1 <a> <a> | 16: WHILEM
0 out of 0..1 cc=140fb88
Setting an EVAL scope, savestack=8
1 <a> <a> | 10: OPEN2
1 <a> <a> | 12: PLUS
ALNUM can match 1 times out of 32767...
Setting an EVAL scope, savestack=8
2 <aa> <> | 14: CLOSE2
2 <aa> <> | 16: WHILEM
1 out of 0..1 cc=140fb88
2 <aa> <> | 17: NOTHING
2 <aa> <> | 18: GROUPP2
2 <aa> <> | 20: IFTHEN
2 <aa> <> | 22: REF2
failed...
failed...
failed...
restoring \2..\2 to undef
failed, try continuation...
1 <a> <a> | 17: NOTHING
1 <a> <a> | 18: GROUPP2
1 <a> <a> | 20: IFTHEN
1 <a> <a> | 30: REF1
2 <aa> <> | 33: EOL
2 <aa> <> | 34: END
Match successful!
66666
Freeing REx: `^(\w+)(\w+)?(?(2)\2\1|\1)$'
If you look at the output of -Mre=debug, you'll see that the REx engine first matches $1='aa', then backtracks until $1='a', so that it can then match \1.
You can read more about backtracking in perlre.
I wouldn't say the 2nd \w+ is useless because the intent seems to be to try to match stuff like
"OneTwoTwoOne".
| MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" | | I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). | | ** The third rule of perl club is a statement of fact: pod is sexy. |
|