use YAPE::Regex::Explain;
die YAPE::Regex::Explain->new(
'^(\w+)(\w+)?(?(2)\2\1|\1)$'
)->explain;
__END__
The regular expression:
(?-imsx:^(\w+)(\w+)?(?(2)\2\1|\1)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2 (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
)? end of \2 (NOTE: because you're using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \2)
----------------------------------------------------------------------
(?(2) if back-reference \2 matched, then:
----------------------------------------------------------------------
\2 what was matched by capture \2
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
| else:
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of conditional on \2
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
E:\>perl -Mre=debug -le"print 66666 if q[aa] =~ /^(\w+)(\w+)?(?(2)\2\1
+|\1)$/"
Freeing REx: `,'
Compiling REx `^(\w+)(\w+)?(?(2)\2\1|\1)$'
size 34 first at 2
synthetic stclass `ANYOF[0-9A-Z_a-z]'.
1: BOL(2)
2: OPEN1(4)
4: PLUS(6)
5: ALNUM(0)
6: CLOSE1(8)
8: CURLYX[1] {0,1}(17)
10: OPEN2(12)
12: PLUS(14)
13: ALNUM(0)
14: CLOSE2(16)
16: WHILEM(0)
17: NOTHING(18)
18: GROUPP2(20)
20: IFTHEN(28)
22: REF2(24)
24: REF1(33)
26: LONGJMP(32)
28: IFTHEN(32)
30: REF1(33)
32: TAIL(33)
33: EOL(34)
34: END(0)
floating `'$ at 1..2147483647 (checking floating) stclass `ANYOF[0-9A-
+Z_a-z]' anchored(BOL) minlen 1
Guessing start of match, REx `^(\w+)(\w+)?(?(2)\2\1|\1)$' against `aa'
+...
Found floating substr `'$ at offset 2...
Does not contradict STCLASS...
Guessed: match at offset 0
Matching REx `^(\w+)(\w+)?(?(2)\2\1|\1)$' against `aa'
Setting an EVAL scope, savestack=3
0 <> <aa> | 1: BOL
0 <> <aa> | 2: OPEN1
0 <> <aa> | 4: PLUS
ALNUM can match 2 times out of 32767...
Setting an EVAL scope, savestack=3
2 <aa> <> | 6: CLOSE1
2 <aa> <> | 8: CURLYX[1] {0,1}
2 <aa> <> | 16: WHILEM
0 out of 0..1 cc=140fb88
Setting an EVAL scope, savestack=8
2 <aa> <> | 10: OPEN2
2 <aa> <> | 12: PLUS
ALNUM can match 0 times out of 32767...
Setting an EVAL scope, savestack=8
failed...
restoring \2..\2 to undef
failed, try continuation...
2 <aa> <> | 17: NOTHING
2 <aa> <> | 18: GROUPP2
2 <aa> <> | 20: IFTHEN
2 <aa> <> | 30: REF1
failed...
failed...
failed...
1 <a> <a> | 6: CLOSE1
1 <a> <a> | 8: CURLYX[1] {0,1}
1 <a> <a> | 16: WHILEM
0 out of 0..1 cc=140fb88
Setting an EVAL scope, savestack=8
1 <a> <a> | 10: OPEN2
1 <a> <a> | 12: PLUS
ALNUM can match 1 times out of 32767...
Setting an EVAL scope, savestack=8
2 <aa> <> | 14: CLOSE2
2 <aa> <> | 16: WHILEM
1 out of 0..1 cc=140fb88
2 <aa> <> | 17: NOTHING
2 <aa> <> | 18: GROUPP2
2 <aa> <> | 20: IFTHEN
2 <aa> <> | 22: REF2
failed...
failed...
failed...
restoring \2..\2 to undef
failed, try continuation...
1 <a> <a> | 17: NOTHING
1 <a> <a> | 18: GROUPP2
1 <a> <a> | 20: IFTHEN
1 <a> <a> | 30: REF1
2 <aa> <> | 33: EOL
2 <aa> <> | 34: END
Match successful!
66666
Freeing REx: `^(\w+)(\w+)?(?(2)\2\1|\1)$'
If you look at the output of -Mre=debug, you'll see that the REx engine first matches $1='aa', then backtracks until $1='a', so that it can then match \1.
You can read more about backtracking in perlre.
I wouldn't say the 2nd \w+ is useless because the intent seems to be to try to match stuff like
"OneTwoTwoOne".
MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" | I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). | ** The third rule of perl club is a statement of fact: pod is sexy. |
|