in reply to immutable pos() inside regex

You did successfully modify the value of what pos() is locally bounded to, but it is just your own local copy, which only exists and is meaningful within your {} scope. When you reenter that {} scope, the value for pos() is taken from what is stored by regexp engine itself.

This does make sense, and it makes the regexp itself more robust.
$\ = "\n"; $, = ","; print 'a string' =~ m/(.)(?{print "before mod, pos = ".pos();pos()+=4; +print "after mod, pos = ".pos();})/g; print pos(); #cause error, it is undef, as now we are inside any regex +p

Replies are listed 'Best First'.
Re^2: immutable pos() inside regex
by diotalevi (Canon) on Dec 08, 2002 at 19:13 UTC

    That much is obvious - that inside of (?{}) there is a pos() value which can be manipulated. Now is $_ a copy of the bound string or is it the actual string. See... if $_ is the actual string then stomping on pos() should work. Hmm... maybe I just answered my own question. Maybe instead of aliasing $_ which is what I thought was happening it's copying it instead.

    No wait... I included a code fragment that shows that $_ is aliased to $str and that $str's pos value is being altered inside the (?{}). It looks like the regex engine is replacing the pos() value with a saved copy. So again - do you know a way to keep the regex engine from copying pos() back into place?

    use Devel::Peek; $str = 'a string'; print 'x' x 50, "\n"; Dump($str); $str =~ m/(?:.(?{Dump($_);Dump(pos());pos()+=3; Dump($_);Dump(pos())}))/g; print STDERR "OUT\n"; Dump($str);
    __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;
      The following piece of code clearly demos two things:
      1. $_ is $str, base on the facts that:
        1. addresses are the same;
        2. $str has been changed even outside the regexp.
      2. However it also clearly demos that regexp only copies the target string once at the beginning, when you enter the regexp engine, as you can see $1 steps through the stream of "a string", not "abc". regexp engine uses its own copy of the target entity, which again, makes sense, whoever wrote regexp need to make the regexp itself robust. They simply disallow you to alter the status and properties of the regexp engine in the way, which you thought you could, but they thought dangerous.
      $\ = "\n"; $, = ","; $str = "a string"; print \$str; $str =~ m/(?:(.)(?{print "address of \$_ is ".\$_.";";print '$_ = '.$_ +.";";print "\$str = ".$str.";";print "\$1 = $1\n";$_="abc"}))+/; print $str;