alafarm has asked for the wisdom of the Perl Monks concerning the following question:

Suppose I have a global substitution like this: s/this/that/g.

I need to capture $&, $-[0], etc. of each match. Unfortunately, this gives me only the last set:

while (s/this/that/g){...}

And this starts the matching at the beginning of the string each time:

whie (s/this/that/){...}

Is there a way to stop and evaluate each time the /g matches (without using the /e modifier)?

Thanks. I must have read right through the docs regarding the \G assertion. That should do the trick.

>>>>Just as a matter of curiosity, what is the reason for this requirement?

We have a legacy system in which perl substitutions (thousands of them) are kept as lists in separate files and are read in by a processing engine, which then essentially does this: eval $mySubstitution. We now need to keep track of what was matched and their positions in the document. Of course we don't want to rewrite all these substitutions: we can have the engine manipulate the regex strings a bit as it reads them in, but not make major changes.

Replies are listed 'Best First'.
Re: Loop through global substitution
by moritz (Cardinal) on Jun 20, 2011 at 12:40 UTC
    Here's a somewhat hacky solution, but it seems to work:
    use strict; use warnings; use 5.010; my $string = 'foobar'; while ($string =~ s/\G.*?\K[aeiou]/\U$&/si) { say $&; } continue { pos($string) = $+[-1]; } say $string;

    The trick is to manually set pos, and then use \G in the regex to anchor to that position. But since you want to match anything after that (and not just exactly at the current position), you need a .*? (and the /s modifier if newlines are in the string).

    Now it matches too much text, but the \K assertion cuts off the start of the match, making $& what it should be.

    \K requires perl 5.10.0, but since that's the third-oldest major perl 5 release, that shouldn't be a problem. It's not 1995 after all :-)

    Insert generic warning about the runtime overhead of $& here, and how it's better to extract it with substr, @+ and @-.

Re: Loop through global substitution
by AnomalousMonk (Archbishop) on Jun 20, 2011 at 14:05 UTC
    Is there a way ... without using the /e modifier ...

    Just as a matter of curiosity, what is the reason for this requirement? I can imagine one would not want to 'go outside' the regex engine due to performance considerations, but the use of a  while loop seems acceptable to alafarm, so why not use the nifty little /e gadget?

Re: Loop through global substitution
by hbm (Hermit) on Jun 20, 2011 at 13:30 UTC
    Perhaps with (?{ code })? Something like this:
    1 while s/this(?{print"$&,$-[0]\n"})/that/;
Re: Loop through global substitution
by ikegami (Patriarch) on Jun 20, 2011 at 17:48 UTC
    Another way:
    my $new = ''; for (;;) { # or: for ($orig) if (/\G(.*?)(this)/sgc) { say "Matched $2 at $-[2]"; $new .= $1 . 'that'; redo; } else { /\G(.*)/sgc; $new .= substr($_, pos); last; } }
Re: Loop through global substitution
by AnomalousMonk (Archbishop) on Jun 20, 2011 at 22:29 UTC
    ... perl substitutions ... are kept as lists in separate files and are read in by a processing engine, which then essentially does this: eval $mySubstitution. We now need to keep track of what was matched and their positions in the document.

    I think the approaches based on \G already posted will serve perfectly well. However, I cannot help thinking the original eval-based process (insofar as I understand it) could easily be extended.

    I imagine the process described above as in the first code sequence below: regex strings and replacement strings wind up in an array or arrays; a loop works through the array(s) eval-ing a  s/// statement for each substitution.

    This could easily be extended as in the second code sequence below. No change whatsoever to either regex string or replacement string seems necessary.

    >perl -wMstrict -le "my @search = qw(foo fee fie foe ); my @replace = qw(bar fum \u$& \U$&); ;; my $document = 'the foo is fee and fie-foe.'; ;; while (my ($i, $rx) = each @search) { eval qq{ \$document =~ s{$rx}{$replace[$i]}g }; } print qq{'$document'}; ;; $document = 'all foo are fee or fie/foe'; ;; my @replaced; while (my ($i, $rx) = each @search) { eval qq{ \$document =~ s{$rx} { push \@replaced, [\$&,\$-[0],\$+[0]]; qq{$replace[$i]} }eg }; } print qq{'$document'}; use Data::Dumper; print Dumper \@replaced; " 'the bar is fum and Fie-FOE.' 'all bar are fum or Fie/FOE' $VAR1 = [ [ 'foo', '4', '7' ], [ 'fee', '12', '15' ], [ 'fie', '19', '22' ], [ 'foe', '23', '26' ] ];
Re: Loop through global substitution
by ikegami (Patriarch) on Jun 20, 2011 at 17:41 UTC

    while (s/this/that/g) { push @matches, $&; }
    Better:
    while (s/this/that/gp) { push @matches, ${^MATCH}; }

    Update: Thought it was m//g.