diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

In both of these examples the string $_ is aliased to the bound string and pos() accurately reflects the current state of the engine. If I modify pos() inside the regex the new value is reflected until the current re_eval exits and then it resets back to it's original value. So what is going on here and what do I need to do to alter the currently bound string's pos() value?

This is the really simple case. I included it so all the instrumentation from the other regex won't clutter things up.

$\ = "\n"; $, = ","; print 'a string' =~ m/(.(?{pos()+=4}))/g; __END__ prints "a, ,s,t,r,i,n,g". It should print "a,i"

Sample B shows this in greater detail

'14567890ab' =~ m/(.(?{$char = substr($_,pos(),1); print "> ".pos()." \"$char\" "; if ($char eq '4' or $char eq '6' or $char eq '8') { print "+".(0+$char)." "; pos() += 0+$char; } else { print '+0 '; } printf "< %2d \"". substr($_,pos(),1)."\" ", pos(); }) (?{printf "-> %2d\n", pos}))+/x; __DATA__ > 1 "4" +4 < 5 "8" -> 1 # alter pos() > 2 "5" +0 < 2 "5" -> 2 > 3 "6" +6 < 9 "b" -> 3 # alter pos() > 4 "7" +0 < 4 "7" -> 4 > 5 "8" +8 < 10 "" -> 5 # alter pos() > 6 "9" +0 < 6 "9" -> 6 > 7 "0" +0 < 7 "0" -> 7 > 8 "a" +0 < 8 "a" -> 8 > 9 "b" +0 < 9 "b" -> 9 > 10 "" +0 < 10 "" -> 10
__SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;

Replies are listed 'Best First'.
Re: immutable pos() inside regex
by pg (Canon) on Dec 08, 2002 at 18:54 UTC
    You did successfully modify the value of what pos() is locally bounded to, but it is just your own local copy, which only exists and is meaningful within your {} scope. When you reenter that {} scope, the value for pos() is taken from what is stored by regexp engine itself.

    This does make sense, and it makes the regexp itself more robust.
    $\ = "\n"; $, = ","; print 'a string' =~ m/(.)(?{print "before mod, pos = ".pos();pos()+=4; +print "after mod, pos = ".pos();})/g; print pos(); #cause error, it is undef, as now we are inside any regex +p

      That much is obvious - that inside of (?{}) there is a pos() value which can be manipulated. Now is $_ a copy of the bound string or is it the actual string. See... if $_ is the actual string then stomping on pos() should work. Hmm... maybe I just answered my own question. Maybe instead of aliasing $_ which is what I thought was happening it's copying it instead.

      No wait... I included a code fragment that shows that $_ is aliased to $str and that $str's pos value is being altered inside the (?{}). It looks like the regex engine is replacing the pos() value with a saved copy. So again - do you know a way to keep the regex engine from copying pos() back into place?

      use Devel::Peek; $str = 'a string'; print 'x' x 50, "\n"; Dump($str); $str =~ m/(?:.(?{Dump($_);Dump(pos());pos()+=3; Dump($_);Dump(pos())}))/g; print STDERR "OUT\n"; Dump($str);
      __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;
        The following piece of code clearly demos two things:
        1. $_ is $str, base on the facts that:
          1. addresses are the same;
          2. $str has been changed even outside the regexp.
        2. However it also clearly demos that regexp only copies the target string once at the beginning, when you enter the regexp engine, as you can see $1 steps through the stream of "a string", not "abc". regexp engine uses its own copy of the target entity, which again, makes sense, whoever wrote regexp need to make the regexp itself robust. They simply disallow you to alter the status and properties of the regexp engine in the way, which you thought you could, but they thought dangerous.
        $\ = "\n"; $, = ","; $str = "a string"; print \$str; $str =~ m/(?:(.)(?{print "address of \$_ is ".\$_.";";print '$_ = '.$_ +.";";print "\$str = ".$str.";";print "\$1 = $1\n";$_="abc"}))+/; print $str;