in reply to Re: Greediness of * vs. greediness of +
in thread Greediness of * vs. greediness of +

In other words (I thought) /(b*)/ stops after the first failure, at the start of the string,

s/failure/success/

whereas adding the /g would tell the regex engine to keep on trying until it reaches the end of the string.

What /g does on a regex depends on context. In boolean scalar context it matches once, and stores the position in pos. If you execute it a second time, it starts off from where it left.

The background is that you can write

while (/(b*)/g) { ... }

and get a new match for each iteration:

$perl -e '$_="abbbabc"; while (/(b*)/g) { print "($1)\n" }' () (bbb) () (b) () ()

Update: Answer to the second question

That this code produces two replacements for the string of four 'b's remains a puzzle. Why does this appear (this may be my error) that the regex conflates two 'b's rather than all four?

A naiive substitution implementation would loop on s/b*/^/, because it would continue to replace the empty string with ^ forever on.

Perl is a bit more sophisticated: It detects a zero-width match, and before doing a second substition of a zero-width match at the same position it bumps along, and tries in the next position.

So applying s/b*/^/ on abba make these steps:

abba | match zero b's before a ^abba | match zero b's again. Don't substitute here, bump along ^abba | match 'bb' ^a^a | match zero b's ^a^^a | match zero b's, don't substitute but bump along ^a^^a^ | match zero b's, don't substitute but bump along

You can watch it work; I didn't find a way to get the modified string, but at least you can monitor the match positions:

$ perl -le '$_ = "abba"; s/b*/print pos; "^"/eg; print' 0 1 3 4 ^a^^a^
Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^3: Greediness of * vs. greediness of +
by JavaFan (Canon) on Sep 08, 2010 at 13:57 UTC
    If you execute it a second time, it starts off from where it left.
    If that where true, while (/(b*)/g) would never finish. It starts where it finished the previous time, unless it matched an empty string. In the latter case, pos() will be advanced by one. (Details are even more complicated. But documented. See the section "Repeated Patterns Matching a Zero-length Substring" in the perlre manual page.)