BUU has asked for the wisdom of the Perl Monks concerning the following question:

Observe:
$_="a1 a2 a3"; while(?a(\d)?g){ print $1; }
It prints "1", and thats all, just as you would expect. The ?? only matches once and then the loop fails. The ??g modifier has no affect. But observe the following:

$_="a1 a2 a3"; while(?a(\d)?g){ print $1; } while(?a(\d)?g){ print $1; }
It prints "12". Repeating the ?? code causes it to match again, but from a different spot. Anyone want to explain this?

Update:
Apparently regular regexes do this also:
$_="a1 a2 a3 a4"; while(/a(\d)/g){ print $1; last; } while(/a(\d)/g){ print $1; }
Actually prints "1234", not "11234", as you might expect. My question is, why does the second time the regex is typed being affected by the first time the regex is typed? They're two completely seperate regexen, at least as far as I can tell.

Replies are listed 'Best First'.
Re: Bizarreness in ?PATTERN? and g
by bmann (Priest) on Jun 04, 2004 at 05:01 UTC
    It's by design. I did the same thing once.

    From perlop:

    ?PATTERN? This is just like the /pattern/ search, except that it matches only once between calls to the reset() operator. This is a useful optimization when you want to see only the first occurrence of something in each file of a set of files, for instance. Only ?? patterns local to the current package are reset. while (<>) { if (?^$?) { # blank line between header and body } } continue { reset if eof; # clear ?? status for next file } This usage is vaguely deprecated, which means it just might possibly be removed in some distant future version of Perl, perhaps somewhere around the year 2168.

    The solution? Use a different delimiter, but I'm sure you already know that ;)

      No no no, you're missing my point. I realize that ?? only matches once, but shouldn't the second regex be seperate from the first regex?
      /regex1/ /regex2/
      Regex1 and 2 shouldn't have anything to do with each other, should they? But in my example, the two ??'s are affecting each other, but they're completely seperate!

        If you want them separate then remove the g modifier.

        $_="a1 a2 a3"; while(?a(\d)?){ print $1; } while(?a(\d)?){ print $1; } __END__ 11

        Updated to add in the snippet.

Re: Bizarreness in ?PATTERN? and g
by BrowserUk (Patriarch) on Jun 04, 2004 at 05:00 UTC

    Using ?? makes the whiles act like ifs.

    perl> $_="a1 a2 a3"; if( /a(\d)/g ){ print $1, ' ', pos( $_ ); } if( /a(\d)/g ){ print $1, ' ', pos( $_ ); } 1 2 2 5

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      Well yeah, because the ??'s only return true. But surely when you repeat the code, it should be a completely seperate regex. Why should it's match be determined by the first regex?

        Because pos( $_ ) isn't being reset. Sort of like you had used /gc. I'm not sure if that constitutes a bug or not?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
Re: Bizarreness in ?PATTERN? and g
by Mr. Muskrat (Canon) on Jun 04, 2004 at 05:04 UTC

    That is not weird at all. Perlop says this:

    ?PATTERN?
    This is just like the /pattern/ search, except that it matches only once between calls to the reset() operator.

    The reason the second snippet works is because you have two separate instances of the same pattern match. The g allows you to continue with the next match (whether that is due to a reset or a separate pattern match). Observe:

    $_="a1 a2 a3"; while(?a(\d)?g){ print $1; reset; } __DATA__ 123

    If you remove that g, you end up with an infinite loops of 1's. Remove the reset and the g and you are right back to the output being a single, solitary 1.

Re: Bizarreness in ?PATTERN? and g
by beth (Scribe) on Jun 04, 2004 at 05:11 UTC
    Buu's code doesn't seem so bizarre - as many people have commented, it kinda makes sense.

    But compare this:
    $s="a1 a2 a3"; for (1..2) { print $1 if $s =~ ?a(\d+)?g; }
    with this:
    $s="a1 a2 a3"; print $1 if $s =~ ?a(\d+)?g; print $1 if $s =~ ?a(\d+)?g;
    I would expect these to be equivalent ... but no, the first example prints "1" and the second prints "1 2".

    The plot thickens!

    update 2004-06-04 01:41 - added conditionals so the first snippet prints "1" rather than "1 1".


    --
    eval pack("H*", "7072696e74207061636b2822482a222c202236613631373036382229");
    # japh or forkbomb? You decide!
      Try printing the value of $_ before each regex and you will see that you try to match 1 and 2 in your first example as the for-loop automagically loads $_ with the "loop index".

      As an aside, your first example prints nothing as the regex never matches!

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      So, hmm. It seems ?? only acts differently if it's inside a loop. The perlop docs mention reset, and reset is only really useful for loops, so perhaps ?? is too? Hopefully someone who knows will come clear this up.

        Each occurence of ?? in code will match only once beteen resets. If you have more than one ??, each one can match once.
        japhy took a look at this but I guess he didn't post about it yet. Basically, loops and repeated statements are not interchangeable - they're compiled differently. He suggested running the following two tests, which are quite enlightening:
        perl -MO=Terse -e '/x/; /x/;' perl -MO=Terse -e '/x/ for 1, 2'


        --
        eval pack("H*", "7072696e74207061636b2822482a222c202236613631373036382229");
        # japh or forkbomb? You decide!
Re: Bizarreness in ?PATTERN? and g
by Anonymous Monk on Jun 04, 2004 at 04:58 UTC
    perl -le '$s="a1, a3 - a5\na11"; while ($i<10) { $i++; $s=~?a(\d+)?g; +print $1; }'
    prints 1, 10 times
    perl -le '$s="a1, a3 - a5\na11"; while ($i<10) { $i++; $s=~/a(\d+)/g; +print $1; }'
    prints 1 3 5 11 11 1 3 5 11 11

    vs.

    perl -le '$s="a1, a3 - a5\na11"; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)? +g; print $1; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)?g; print $1;'
    which prints 1 3 5 11

    so the CODE ??g repeated in the SOURCE seems to behave differently then while (1..4) { ??g }, ie. the CODE ??g EXECUTED multiple times in the code.

    I am confused.

    Edited by Chady -- added code tags.

Re: Bizarreness in ?PATTERN? and g
by CountZero (Bishop) on Jun 04, 2004 at 05:55 UTC
    $value1="a1 a2 a3 a4"; $value2=$value1; while($value1=~/a(\d)/g){ print $1; last; } while($value2=~/a(\d)/g){ print $1; }

    does indeed output 11234 so it has everything to do with the fact that it is not the regex, but rather the variable your are trying to match is the same or not.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Bizarreness in ?PATTERN? and g
by Roy Johnson (Monsignor) on Jun 04, 2004 at 13:43 UTC
    The //g modifier is tied to pos(), which is tied to the scalar being matched against. All matches on the same scalar with the g option read and set the same pos().

    The ?? is more like a flip-flop, where the counter is attached to the expression itself, so separate ?? expressions each match once, even if they use the same pattern and/or match against the same scalar.


    The PerlMonk tr/// Advocate
Re: Bizarreness in ?PATTERN? and g
by Anonymous Monk on Jun 04, 2004 at 05:06 UTC
    Sorry about the previous post, didn't read the FAQ

    here it is PROPERLY FORMATTED

    perl -le '$s="a1, a3 - a5\na11"; while ($i<10) { $i++; $s=~?a(\d+)?g; print $1; }'
    prints 1, 10 times

    perl -le '$s="a1, a3 - a5\na11"; while ($i<10) { $i++; $s=~/a(\d+)/g; print $1; }'
    prints 1 3 5 11 11 1 3 5 11 11

    vs.

    perl -le '$s="a1, a3 - a5\na11"; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)?g; print $1; $s=~?a(\d+)?g; print $1;'
    which prints 1 3 5 11

    so the CODE ??g repeated in the SOURCE seems to behave differently then while (1..4) { ??g }, ie. the CODE ??g EXECUTED multiple times in the code.

    I am confused.

Re: Bizarreness in ?PATTERN? and g
by integral (Hermit) on Jun 04, 2004 at 15:37 UTC

    As has already been said a match in scalar context with /g doesn't 'reset' pos() so that the next match with /g on that variable will start from that position.

    The key error in your understanding is that "they're two completely separate regexen". They're not because they're connected by pos() since they have /g.

    In first examples using ?? work just the same as the latter case with m// with the once-only nature of ?? just being an additional constraint. ?? works exactly like m// with regards to setting pos() and starting from pos() in scalar context with /g.

    So the key point is that the position of the last match is being carried over between the statements, which you weren't expecting.

    --
    integral, resident of freenode's #perl
    
Re: Bizarreness in ?PATTERN? and g
by Anonymous Monk on Jun 04, 2004 at 18:30 UTC
    Hi all. japhy speaking. Here's the run-down:
    • The regex variables are not reset or changed after a failed regex.
    • The /g modifier, in scalar context, tells the regex to match once, and then update pos($str) to wherever the regex ended in the string. The next time that string is matched against by a regex with the /g flag, the regex will start looking NO SOONER than pos($str) in the string. pos() is not tied to a regex, it's tied to a string.
    • A m?? regex matches only once (in between calls to reset()). That behavior is tied to THAT specific regex.
    • Perl is compiled. That means /x/ for 1, 2; is different from /x/; /x/;.
    Put it all together and you have this fact:
    # code 1 $str = "abc"; for (1, 2, 3) { print $1 if $str =~ ?(.)?g; } # code 2 $str = "abc"; print $1 if $str =~ ?(.)?g; print $1 if $str =~ ?(.)?g; print $1 if $str =~ ?(.)?g;
    The first code only prints 'a'. The second code prints 'abc'. This is because the first code has only one PMOP (Perl's internal representation of a pattern match operation), whereas the second code has THREE of them. Each PMOP has its own flags, such as the "I'm a m?? regex" flag.

    Now for a bit of fun. What does this code print?

    $str = "abc"; for (1, 2, 3) { $str =~ ?(.)?g; print $1; }
    Does it print "a" (and then two empty strings)? No. Why not? Because the regex variables ($1, et. al.) are in a *slightly* larger scope than you'd expect: they retain their values for the duration of that for loop. It would be similar to saying:
    $str = "abc"; { local ($_1, $_2, ...); for (1, 2, 3) { $str =~ ?(.)?g and ($_1, $_2, ...) = ($1, $2, ...); print $_1; } }
    except, of course, that you don't have to.