Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

/g matches not really global in scalar context!

by blazar (Canon)
on Nov 14, 2005 at 11:42 UTC ( [id://508283]=perlmeditation: print w/replies, xml ) Need Help??

In a private mail exchange with a friend, talking about Perl, he wrote "aiutino" (long story, Italian monks will understand!) at which point I replied with

/ \b aiutino \b (?{ warn "Argh!" }) /gix;
Now I know that the extended regex (?{ code }) feature is still deemed experimental. But I really wanted to use it both for the sake of illustrating it and because IMHO it strongly stresses the idea that each use of the "incriminated" word is strictly associated with the emission of an "Argh!" warning.

To be sure, before sending the mail, I tried the code, and much to my surprise only one warning was emitted even if a line contained more instances of "aiutino".

So I tried

() = / \b aiutino \b (?{ warn "Argh!" }) /gix;
instead, and it did work as expected. Now it seems that the /g modifier on the match operator doesn't make it really global in scalar (or void) context, which could be because of an optimization effect, since for ordinary matches it doesn't really make a difference.

Indeed in quite a few years of Perl programming, I had never realized this was the case. But! if one deliberately specifies the /g modifier even in scalar context, then he should get what that it promises, and be them as experimental as you like, extended regex features executing code may be a good reason to guarantee it.

Any thoughts?

Replies are listed 'Best First'.
Re: /g matches not really global in scalar context!
by demerphq (Chancellor) on Nov 14, 2005 at 12:30 UTC

    This behaviour is intended and documented. Consider the following:

    for ("I am a fish","I am a nonfish","I am a nonfish","fish I am a") { print "Start> $_\n"; if (/\GI\s*/g) { print "I\n"; if (/\Gam a\s*/g) { print "am a\n"; if (/\Gfish/g) { print "Yah! I am a fish\n"; } } } }

    In other words, /g in a scalar context allows multiple distinct regexes to match after each other but from the place where the previous left off. This is very useful behaviour and is well worth familiarizing yourself with.

    BTW, part of the reason that those constructs are marked experimental is because they allow sideeffects, which means that if the behaviour of the regex engine changes there is no guarantee the same side effects will occur. A good example is the following regex:

    $_="aiutino aiutino aiutino fnorbley"; / \b aiutino \b \s* (?{ warn "Argh!" }) fnorbletrx /gix;

    Note that it doesnt warn AT ALL. The reason is the longest fixed string in the pattern is 'fnorbletrx' but the string does not contain this pattern, and so the regex never runs as a regex. Which means the warn is never executed. In fact this pattern and this string is more like /.../ if instr($_,'fnorbletrx'); Its worth remembering this isa very common optimisation. Any regex that involves a mandatory constant string will automatically use this logic internally.

    ---
    $world=~s/war/peace/g

      This behaviour is intended and documented. Consider the following:

      [SNIP]

      In other words, /g in a scalar context allows multiple distinct regexes to match after each other but from the place where the previous left off. This is very useful behaviour and is well worth familiarizing yourself with.
      Indeed I was about to reply that if it is documented, then it is in some well hidden place, but checking perldoc perlop to be sure I found that it is very well documented!! So I wonder how I could fail to notice it for such a long time, but now that I know, even if it seems I've not really needed it till now, I fully agree that while slightly counter-intuitive, it's such a useful feature that that tiny ()= is a cheap trade-off...
Re: /g matches not really global in scalar context!
by Aristotle (Chancellor) on Nov 14, 2005 at 13:33 UTC

    This is fully intended. It’s there so you can say

    while( m{ \b aiutino \b }gix ) { warn "Argh!" }

    Makeshifts last the longest.

      For the record I didnt use this example as it would be just too perlish to special case /g inside of a while loop. I used the non while version to show that this is a general case, and not a consequence of using a while. That way people dont associate this use of /g with a while loop, especially as 99.9999% of the time one finds it in such a context.

      ---
      $world=~s/war/peace/g

        To chain matches, you’d usually use /gc, not just /g. Plain /g in scalar context is almost always associated with repeated match attempts for the same pattern.

        Makeshifts last the longest.

Re: /g matches not really global in scalar context!
by jonix (Friar) on Nov 14, 2005 at 13:56 UTC
    The reason behind this behaviour of /g (returning only one match at a time) becomes even more clear to me with a for loop:
    use warnings; use strict; my $x = "aiutino aiutino argh! aiutino"; warn "Argh, one time" for $x =~ / \b aiutino \b /ix; warn "Argh, three times" for $x =~ / \b aiutino \b /gix;
    jonix

      Don’t confuse yourself. foreach puts its expression into list context, so a /g match actually returns a list of all matches at once; of course the foreach then iterates over them one by one. The loop construct you want is while, which evaluates its condition in scalar context, so a /g match as the condition will return matches one by one.

      Makeshifts last the longest.

        Thanks for making this clear. Is there a way to get all those /g matches at once without a loop?
        OK just found it:
        use warnings; use strict; my $x = "aiutino aiutino argh! aiutino"; my @matches = $x =~ / \b aiutino \b /gix; print "$_\n" for @matches;
        so it is depending on scalar vs. list context what you will get.
        Thanks again,
        jonix

      Huh?!?

      I think you misunderstood me! I do use very often the /g modifier in list context:

      my $n =()= / \b aiutino \b /gix; warn "Argh, $n times";
      or
      warn "$_ => Argh!\n" for / \b aiutino \b /gix; # Ok: not that interesting with a fixed lenght string!

      Update: in the code above I did s/smart/interesting/, as per Aristotle's remark: indeed the latter adjective better reflects my feeling - I meant that $_ saves me some typing but will always have the value 'aiutino', and it would be more interesting with a more complex regex. Well, thinking of it better, not quite always 'aiutino', because I used /i, but still there ought to be more interesting cases...

        not that smart with a fixed lenght string!

        Why not? Firstly, you’re using \b, which condition you’d have to code manually if you were to use a string search – that would require a lot of extra code and would be slower. Secondly, the regex engine is smart enough to do a simple string search itself when it sees simple patterns like yours.

        So absolutely, you should use a pattern for the job you’re after.

        Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://508283]
Approved by Corion
Front-paged by rinceWind
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-03-29 13:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found