Re: Greediness of * vs. greediness of +

This is a "further" question re the replies from moritz and Marshall.

And in following their answers, I found it helpful to remember that "greedy" is not the same as "global." While (b*) is greedy, it is not global, /(b*)/g. In other words (I thought) /(b*)/ stops after the first (moritz caught this)~~failure~~ success , at the start of the string, whereas adding the /g would tell the regex engine to keep on trying until it reaches the end of the string.

Well, that was my second thought.

But, OOOPS,

"abbbbc"=~/(b*)/g && print "Found: $1";
# Found:
[download]

Huh?

Well, is this a case where the rules are different in substitution?

my $string1 = "abbbbc";

my $found1 = $string1 =~ s/(b*)/^/g;

print "\$string1: $string1, \$found1: $found1"; $found1 is 4

=head

after s///  $string: ^a^^c^
At begin of string
    ...no "b" found |         # satisfies "0 or more 'b's"
         "a" (duh!)  |
     Two "b"(s) found |       # likewise; the first and second 'b's ar
+e conflated? 
         and again     |
             "c"        |
     no "b" after "c"    |
=cut
[download]

That this code produces two replacements for the string of four 'b's remains a puzzle. Why does this appear (this may be my error) that the regex conflates two 'b's rather than all four?

Enlightenment?
or Coffee?

Update: Wonderful answer below. s/failure/success/ per moritz; italics closed per ssandv.

Comment on Re: Greediness of * vs. greediness of + Select or Download Code

Replies are listed 'Best First'.
Re^2: Greediness of * vs. greediness of + by moritz (Cardinal) on Sep 08, 2010 at 13:07 UTC
In other words (I thought) /(b)/ stops after the first failure, at the start of the string, s/failure/success/ whereas adding the /g would tell the regex engine to keep on trying until it reaches the end of the string. What /g does on a regex depends on context. In ~~boolean~~ scalar context it matches once, and stores the position in pos. If you execute it a second time, it starts off from where it left. The background is that you can write `while (/(b)/g) { ... }` [download] and get a new match for each iteration: `$perl -e '$_="abbbabc"; while (/(b)/g) { print "($1)\n" }' () (bbb) () (b) () ()` [download] Update: Answer to the second question That this code produces two replacements for the string of four 'b's remains a puzzle. Why does this appear (this may be my error) that the regex conflates two 'b's rather than all four? A naiive substitution implementation would loop on `s/b/^/`, because it would continue to replace the empty string with ^ forever on. Perl is a bit more sophisticated: It detects a zero-width match, and before doing a second substition of a zero-width match at the same position it bumps along, and tries in the next position. So applying `s/b/^/` on `abba` make these steps: `abba \| match zero b's before a ^abba \| match zero b's again. Don't substitute here, bump along ^abba \| match 'bb' ^a^a \| match zero b's ^a^^a \| match zero b's, don't substitute but bump along ^a^^a^ \| match zero b's, don't substitute but bump along` [download] You can watch it work; I didn't find a way to get the modified string, but at least you can monitor the match positions: `$ perl -le '$_ = "abba"; s/b/print pos; "^"/eg; print' 0 1 3 4 ^a^^a^` [download] Perl 6 - links to (nearly) everything that is Perl 6.	[reply] [d/l] [select]
Re^3: Greediness of * vs. greediness of + by JavaFan (Canon) on Sep 08, 2010 at 13:57 UTC
If you execute it a second time, it starts off from where it left. If that where true, `while (/(b)/g)` would never finish. It starts where it finished the previous time, unless it matched an empty string*. In the latter case, `pos()` will be advanced by one. (Details are even more complicated. But documented. See the section "Repeated Patterns Matching a Zero-length Substring" in the `perlre` manual page.)	[reply] [d/l] [select]