Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Global match and capture group in look-ahead == under-populated "@-"?

by Anonymous Monk
on Nov 19, 2022 at 11:47 UTC ( [id://11148253]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

use strict; use warnings; use Data::Dump 'dd'; $_ = 'a'; dd s/(?=(a))/b/; dd @-; dd $_; $_ = 'a'; dd s/(?=(a))/b/g; dd @-; dd $_;

Is this a bug?

1 (0, 0) "ba" 1 0 "ba"

Replies are listed 'Best First'.
Re: Global match and capture group in look-ahead == under-populated "@-"?
by haukex (Archbishop) on Nov 19, 2022 at 13:20 UTC
    Is this a bug?

    The regexp debugger shows that in the /g case, the last match in the "implicit loop" of /g is a failed one (see also Repeated Patterns Matching a Zero length Substring), probably leading to that result in @-; note that $+ and $^N are also undef in this case. In addition to the below, try perl -MRegexp::Debugger -e '$_="a";s/(?=(a))/b/g'.

    However, using those variables after a /g match feels a little sketchy to me, so this may be an XY problem, so if you could explain the bigger picture, perhaps we can help solve that.

    $ perl -Mre=debug -e '$_="a";s/(?=(a))/b/g' Compiling REx "(?=(a))" Final program: 1: IFMATCH[0] (11) 3: OPEN1 (5) 5: EXACT <a> (7) 7: CLOSE1 (9) 9: SUCCEED (0) 10: TAIL (11) 11: END (0) minlen 0 Matching REx "(?=(a))" against "a" 0 <> <a> | 0| 1:IFMATCH[0](11) 0 <> <a> | 1| 3:OPEN1(5) 0 <> <a> | 1| 5:EXACT <a>(7) 1 <a> <> | 1| 7:CLOSE1(9) 1 <a> <> | 1| 9:SUCCEED(0) | 1| SUCCEED: subpattern success... 0 <> <a> | 0| 11:END(0) Match successful! Matching REx "(?=(a))" against "a" 0 <> <a> | 0| 1:IFMATCH[0](11) 0 <> <a> | 1| 3:OPEN1(5) 0 <> <a> | 1| 5:EXACT <a>(7) 1 <a> <> | 1| 7:CLOSE1(9) 1 <a> <> | 1| 9:SUCCEED(0) | 1| SUCCEED: subpattern success... 0 <> <a> | 0| 11:END(0) END: Match possible, but length=0 is smaller than requested=1, failing +! 1 <a> <> | 0| 1:IFMATCH[0](11) 1 <a> <> | 1| 3:OPEN1(5) 1 <a> <> | 1| 5:EXACT <a>(7) | 1| failed... | 0| failed... Match failed Freeing REx: "(?=(a))"
      > END: Match possible, but length=0 is smaller than requested=1, failing!

      Dunno which length is meant, but I have a hunch, this is a mechanism to avoid endless loops restarting from the same pos.

      edit

      seems so: Repeated Patterns Matching a Zero-length Substring

      > WARNING: Difficult material (and prose) ahead. ...

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

      It was golfing exercise, where I tried everything possible presumably correct from language spec POV as I see it, be it sketchy or not. In the end I didn't need @-, so no problem at all, thank you, just curiosity. Perhaps I know less about regular expressions than I thought.

      Consistent behaviour would be $#- and $#+ both the same and either 0 or 2 here. And all three $1, $+, $^N ending up the same here, also. It's not what I observe.

      Can you explain debug output, why the

      Matching REx "(?=(a))" against "a"

      and then

      SUCCEED: subpattern success...

      are mentioned twice?

      Same with perl -MRegexp::Debugger -e '$_="a";s/(?=(a))/b/g', I hit Enter too many times, why are steps 0..7 repeated twice?

        $#- and $#+ both the same and either 0 or 2

        please read "either -1 or 1"

Re: Global match and capture group in look-ahead == under-populated "@-"? (updated)
by LanX (Saint) on Nov 19, 2022 at 14:19 UTC
    https://perldoc.perl.org/perlvar#@-

      $-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match.

    edit

    It's hard to tell which result you expect, but the way pos is changed after replacing a zero length assertion is tricky. It's meant to avoid endless loops.

    use v5.12; use warnings; use Data::Dump 'dd'; $_ = 'aa'; s/(?=(a))(?{say "\$1='$1' \@-=(@-) pos=",pos})/b/g; dd [$1]; dd \@-; dd $_; __END__ $1='a' @-=(0 0) pos=0 $1='a' @-=(0 0) pos=0 $1='a' @-=(1 1) pos=1 $1='a' @-=(1 1) pos=1 ["a"] [1] "baba"

    update

    this may shed some light

    use v5.12; use warnings; use Data::Dump 'dd'; $_ = 'ab'; s/(?=([ab]))(?{say "\$1='$1' \@-=(@-) pos=",pos})/X/g; dd [$1]; dd \@-; dd $_; __END__ $1='a' @-=(0 0) pos=0 $1='a' @-=(0 0) pos=0 $1='b' @-=(1 1) pos=1 $1='b' @-=(1 1) pos=1 ["b"] [1, 1] "XaXb"

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

Re: Global match and capture group in look-ahead == under-populated "@-"?
by Anonymous Monk on Nov 19, 2022 at 12:34 UTC
    Dynamic scope?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11148253]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (8)
As of 2024-03-28 11:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found