lokiloki has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl $tempstr = qq` [IF a = /1/ 1 ELSIF b = /2/ 2 ELSE 3 ] `; my $reg = qr{\[(?:(?>[^\[\]]+)|(??{$reg}))*\]}; $tempstr =~ s/\[(?:IF|ELSIF|ELSE)(\s+\S+\s*\=\s*.*?)?\n((?:$reg|\n|[^\ +[\]])+\])/iffer($1,$2)/ges; sub iffer { my $test = shift; $test .= shift; print "trying iffer... test: $test\n"; my $reg = qr{\[(?:(?>[^\[\]]+)|(??{$reg}))*\]}; $test =~ s/(?:(?:$reg|\n|[^\[\]])+?)(^(?:ELSIF|ELSE\s*\n).*\]|\])/\[$1 +/sm; return $test; }

The above code runs "iffer" once and results in:

$ ./t.pl trying iffer... test: a = /1/1 ELSIF b = /2/ 2 ELSE 3 ] $

The following code, with the only difference being the inclusion of the while loop, results in a different (and desired) result:

#!/usr/bin/perl $tempstr = qq` [IF a = /1/ 1 ELSIF b = /2/ 2 ELSE 3 ] `; my $reg = qr{\[(?:(?>[^\[\]]+)|(??{$reg}))*\]}; while ($tempstr =~ s/\[(?:IF|ELSIF|ELSE)(\s+\S+\s*\=\s*.*?)?\n((?:$reg +|\n|[^\[\]])+\])/iffer($1,$2)/ges) {} sub iffer { my $test = shift; $test .= shift; print "trying iffer... test: $test\n"; my $reg = qr{\[(?:(?>[^\[\]]+)|(??{$reg}))*\]}; $test =~ s/(?:(?:$reg|\n|[^\[\]])+?)(^(?:ELSIF|ELSE\s*\n).*\]|\])/\[$1 +/sm; return $test; }

And the results:

$ ./t.pl trying iffer... test: a = /1/1 ELSIF b = /2/ 2 ELSE 3 ] trying iffer... test: b = /2/2 ELSE 3 ] trying iffer... test: 3 ] $

Can someone help me understand why these two result in two totally different results? Shouldnt the inclusion of the /g modifier in the first ensure that the pattern is matched repeatedly (i.e., the ELSIF and then ELSE patterns match)... (The complexity of the regular expressions are to account for the possibility of nested conditionals in my own "mini-language".)

Replies are listed 'Best First'.
Re: Complex regex and apparent failure of /g option (pos)
by tye (Sage) on Apr 03, 2007 at 05:15 UTC

    Your original string doesn't start out containing "[ELSIF" so, of course, the s/\[(?:IF|­ELSIF|ELSE­).../.../ges never replaces the "ELSIF" in the string that is not preceeded by "[".

    It is only after the first iteration of the replacement where doing s/.../\[$1/sm ends up inserting the "[" before the "ELSIF" that makes the outer replacement capable of replacing it. But, by then, the outer replacement has already moved past the position in the string where it would make that replacement and so it won't do it unless you start it over.

    See perldoc -q commas for a similar but simpler example. You can see that they also use an empty loop to deal with the problem. It also shows how look-aheads can be used to avoid the loop (though I'm not sure that approach will work here).

    - tye        

Re: Complex regex and apparent failure of /g option
by diotalevi (Canon) on Apr 03, 2007 at 06:04 UTC

    Without commenting on anything the other two people have said, your use of lexicals and (??{...}) is incorrect. In general, unless you know perlguts you cannot intuit a non-buggy way to use these. Here is a proper way. Its far easier to just use globals - at least you don't need to know internals to reason about their operation.

    my $reg; $reg = qr/...(??{sub{$reg}->()}.../;

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Complex regex and apparent failure of /g option
by parv (Parson) on Apr 03, 2007 at 03:16 UTC
    Could you please exercise the /x option in regular expressions of yours to make them more readable (by judiciuos use of whitespace)?