Re: Explanation for Reg Expr

It seems to be self-explanatory, with the not-so-badly chosen variable names and all. You can google for "perldoc regex", click on the first item and search within your browser page for '?='. Cheers.

Update: Sorry about the less than helpful note above. As Moron says, the relevant section of perlre is not very long-winded. Here is some rant that is:

When I am confused about a regex (and when am I not?) I have recently been trying my hand at -Mre=debug, in commands such as

perl -Mre=debug -e '@rr = "efgmnoxyz" =~ /(?=(\w\w\w))/g'
[download]

which, to tell you the truth, usually confuses me some more (Hey, it may be a slow cpu I've got between my ears, but it consumes little power, making me a good candidate for high-density datacenter stacking).

It appears that in this particular case, the "Matching REx" lines in the output from the command above may help (which you may capture, on some systems, by adding 2>&1 | grep 'Matching REx' at the end of the command). I changed your digits to letters here, to avoid mixup with numbering that -Mre=debug generates.

The output should be something like:

# perl -Mre=debug -e '@matches = "efgmnoxyz" =~ /(\w\w\w)/g' 2>&1 | gr
+ep 'Matching REx'
Matching REx "(\w\w\w)" against "efgmnoxyz"
Matching REx "(\w\w\w)" against "mnoxyz"
Matching REx "(\w\w\w)" against "xyz"
[download]

Now, if we do the same thing with the zero-width positive look-ahead assertion, (?=PATTERN), we get:

# perl -Mre=debug -e '@matches = "efgmnoxyz" =~ /(?=(\w\w\w))/g' 2>&1 
+| grep 'Matching REx'
Matching REx "(?=(\w\w\w))" against "efgmnoxyz"
Matching REx "(?=(\w\w\w))" against "efgmnoxyz"
Matching REx "(?=(\w\w\w))" against "fgmnoxyz"
Matching REx "(?=(\w\w\w))" against "gmnoxyz"
Matching REx "(?=(\w\w\w))" against "mnoxyz"
Matching REx "(?=(\w\w\w))" against "noxyz"
Matching REx "(?=(\w\w\w))" against "oxyz"
Matching REx "(?=(\w\w\w))" against "xyz"
[download]

Ignoring the issue of the repeated "efgmnoxyz", the main difference here seems to be that the operation here is eating up one char at a time, whereas it was eating up chunks of three without the zero-width positive look-ahead assertion.

The straightforward /(\w\w\w)/g keeps matching three alphanumerics in a row and each time it does match, it jumps to the end of the match to start trying again (and it will keep trying, until it can't anymore, because of the /g modifier).

The trickier /(?=(\w\w\w))/g, on the other hand, tries to match whatever, followed by three alphanumerics. The three alphanumerics are actually not considered to be part of the match, it is a way of saying "but, beware, whatever must be followed by three alphanumerics!" In this case, there happens to be no particular whatever. So why do the three characters of alphanumeric afterthought then obviously make it into the array which collects all the captures? Because the parens surrounding the \w\w\w, even though they don't appear to be our main target in this regex, tell perl that it should capture them.

Comparing our string against /(?=(\w\w\w))/g, whatever is a match -- anything, really -- as long as it is followed by three alphanumerics, which we capture. Hey, it's easy for perl, then. Right at the beginning of the string, it has a whatever -- a nothing. Great, but is it followed by three alphanumerics? Lo and behold, yes, it is -- it is followed by 'efg' (which perl captures because of the parentheses immediately surrounding the \w\w\w). Then, because of the /g modifier, perl moves forward one step, at which point, of course, it has another whatever (it always does), but is it followed by three alphanumerics? Lo and behold, yes, it is -- it is followed by 'fgm' (which perl captures because of the parentheses immediately surrounding the \w\w\w). etc.

That's how I think it is, anyway. You should not take this to heart until wiser monks have had their chance to correct or add to it. Hope this helps.

Comment on Re: Explanation for Reg Expr Select or Download Code