It appears that the combination of repetition, alternation and captures can cause the captures to "leak".
Seen on 5.10 and 5.8:use strict; #use warnings; #printing lots of undefs print "Enter your test strings:\n"; while (<main::DATA>) { chomp; print "\tTesting '$_':\n"; /^(?:(?:(\d+)\s*c\s*)|(?:(\d+)\s*w\s*)|(?:(\d+)\s*r\s*))+/i; print "Capturing \\d+ only: 1='$1', 2='$2', 3='$3'\n"; /^(?:(?:(\d+\s*c)\s*)|(?:(\d+\s*w)\s*)|(?:(\d+\s*r)\s*))+/i; print "Capturing \\d+ plus the letter: 1='$1', 2='$2', 3='$3'\n"; } __DATA__ 1c 2w 2c3w 1w1w 1w2r 2r1c
Note that the only difference in the regexes is the placement of the capture's closing bracket.
The above prints:Enter your test strings: Testing '1c': Capturing \d+ only: 1='1', 2='', 3='' Capturing \d+ plus the letter: 1='1c', 2='', 3='' Testing '2w': Capturing \d+ only: 1='', 2='2', 3='' Capturing \d+ plus the letter: 1='', 2='2w', 3='' Testing '2c3w': Capturing \d+ only: 1='3', 2='3', 3='' Capturing \d+ plus the letter: 1='2c', 2='3w', 3='' Testing '1w1w': Capturing \d+ only: 1='1', 2='1', 3='' Capturing \d+ plus the letter: 1='', 2='1w', 3='' Testing '1w2r': Capturing \d+ only: 1='2', 2='2', 3='2' Capturing \d+ plus the letter: 1='', 2='1w', 3='2r' Testing '2r1c': Capturing \d+ only: 1='1', 2='', 3='2' Capturing \d+ plus the letter: 1='1c', 2='', 3='2r'
The second regex does what I expect. I can't fathom why the first regex would do what it does, however. I expected it would be the same as the second regex, minus the letters. Instead, after the first repetition, it seems that a match for $2 spills a copy into $1, and a match for $3 spills copies into $2 and $1... but only if the captures contain the same regex pattern (\d+ in this case).
Would I be correct to suspect that this is a bug or mis-optimization of some kind in perl?
In reply to Leaking Regex Captures by SuicideJunkie
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |