eibwen has asked for the wisdom of the Perl Monks concerning the following question:

Ever since I've read Obfuscated regexp, I've been trying to figure out how to create a global regex using (?{ }) as opposed to m//g. I've spent awhile looking at the code and refractoring it, but I still can't seem to figure out why this iterates:

qr/<([\w\d]+)\s+(?{ print $^N })\s*?>/;

I've passed the code through use re qw/debug/; and YAPE::Regex::Explain, which rewrote the regex:

(?-imsx:<([\w\d]+)\s+(?{ print $^N })\s*?>)

Yet while I (believe I) understand how the first match is made, I still cannot seem to figure out why subsequent matches occur.

I've tried creating simplified regexes to illustrate the idiom:

#!/usr/bin/perl -w use strict; my $s = "abacada"; # 'bcd' delimited by 'a' print $s =~ /[^a]/g; # prints 'bcd' $s =~ /([^a])(?{ print $^N })/; # prints 'b'

However I cannot seem to duplicate the functionality with a terse example as my attempts have only returned the first match (as use re qw/debug/; confirms). While the reason for this behavior is fairly obvious given the regex, it underscores the fact that I have yet to grasp how the first code iterates.

Can someone please elucidate what I'm missing so that I might progress along the path of enlightment?

Replies are listed 'Best First'.
Re: Global Regex sans //g
by tlm (Prior) on May 02, 2005 at 03:13 UTC

    With the regexp that you posted:

    '<a /><b >' =~ /<([\w\d]+)\s+(?{ print $^N })\s*?>/; __END__ ab
    The reason is that when the engine first gets to the print statement, the match is still succeeding, so the print proceeds; but, the first match eventually fails, because of the /, so the engine starts searching again, and finds (and prints) the second match. I'm not sure if this answers your question, but at least it illustrates what appears like an iteration.

    BTW, \w implies \d, so [\w\d] is redundant.

    Update: Also, if in the regexp in Obfuscated regexp you get rid of the \1 at the very end, then the print statements get executed only once. Clearly this \1 is what causes the successive matches to fail, thus forcing the engine to start searching again. Once it is removed, the match succeeds, and the engine stops.

    the lowliest monk

      Thanks for pointing that out! The output from use re qw/debug/; makes a lot more sense now that I realize what was happening.

      However, using \1 to force matches to fail seems problematic at best, as the string could be repetitive and thus succeed and stop prematurely. A contradiction (eg [^\D\d]) would preclude this possiblity, but are there any implications of such a contradiction beyond this usage?

        I think that such a contradiction would work fine, or the more succinct (and zero-width) (?!).

        the lowliest monk

Re: Global Regex sans //g
by Errto (Vicar) on May 02, 2005 at 02:53 UTC
    I'm not totally clear what you mean. Can you flesh out your first example into a complete code snippet that shows the pattern iterating?