in reply to Explanation for Reg Expr
Update: Sorry about the less than helpful note above. As Moron says, the relevant section of perlre is not very long-winded. Here is some rant that is:
which, to tell you the truth, usually confuses me some more (Hey, it may be a slow cpu I've got between my ears, but it consumes little power, making me a good candidate for high-density datacenter stacking).perl -Mre=debug -e '@rr = "efgmnoxyz" =~ /(?=(\w\w\w))/g'
It appears that in this particular case, the "Matching REx" lines in the output from the command above may help (which you may capture, on some systems, by adding 2>&1 | grep 'Matching REx' at the end of the command). I changed your digits to letters here, to avoid mixup with numbering that -Mre=debug generates.
The output should be something like:
Now, if we do the same thing with the zero-width positive look-ahead assertion, (?=PATTERN), we get:# perl -Mre=debug -e '@matches = "efgmnoxyz" =~ /(\w\w\w)/g' 2>&1 | gr +ep 'Matching REx' Matching REx "(\w\w\w)" against "efgmnoxyz" Matching REx "(\w\w\w)" against "mnoxyz" Matching REx "(\w\w\w)" against "xyz"
Ignoring the issue of the repeated "efgmnoxyz", the main difference here seems to be that the operation here is eating up one char at a time, whereas it was eating up chunks of three without the zero-width positive look-ahead assertion.# perl -Mre=debug -e '@matches = "efgmnoxyz" =~ /(?=(\w\w\w))/g' 2>&1 +| grep 'Matching REx' Matching REx "(?=(\w\w\w))" against "efgmnoxyz" Matching REx "(?=(\w\w\w))" against "efgmnoxyz" Matching REx "(?=(\w\w\w))" against "fgmnoxyz" Matching REx "(?=(\w\w\w))" against "gmnoxyz" Matching REx "(?=(\w\w\w))" against "mnoxyz" Matching REx "(?=(\w\w\w))" against "noxyz" Matching REx "(?=(\w\w\w))" against "oxyz" Matching REx "(?=(\w\w\w))" against "xyz"
The straightforward /(\w\w\w)/g keeps matching three alphanumerics in a row and each time it does match, it jumps to the end of the match to start trying again (and it will keep trying, until it can't anymore, because of the /g modifier).
The trickier /(?=(\w\w\w))/g, on the other hand, tries to match whatever, followed by three alphanumerics. The three alphanumerics are actually not considered to be part of the match, it is a way of saying "but, beware, whatever must be followed by three alphanumerics!" In this case, there happens to be no particular whatever. So why do the three characters of alphanumeric afterthought then obviously make it into the array which collects all the captures? Because the parens surrounding the \w\w\w, even though they don't appear to be our main target in this regex, tell perl that it should capture them.
Comparing our string against /(?=(\w\w\w))/g, whatever is a match -- anything, really -- as long as it is followed by three alphanumerics, which we capture. Hey, it's easy for perl, then. Right at the beginning of the string, it has a whatever -- a nothing. Great, but is it followed by three alphanumerics? Lo and behold, yes, it is -- it is followed by 'efg' (which perl captures because of the parentheses immediately surrounding the \w\w\w). Then, because of the /g modifier, perl moves forward one step, at which point, of course, it has another whatever (it always does), but is it followed by three alphanumerics? Lo and behold, yes, it is -- it is followed by 'fgm' (which perl captures because of the parentheses immediately surrounding the \w\w\w). etc.
That's how I think it is, anyway. You should not take this to heart until wiser monks have had their chance to correct or add to it. Hope this helps.
|
|---|