in reply to Re: Greediness of * vs. greediness of +
in thread Greediness of * vs. greediness of +
In other words (I thought) /(b*)/ stops after the first failure, at the start of the string,
s/failure/success/
whereas adding the /g would tell the regex engine to keep on trying until it reaches the end of the string.
What /g does on a regex depends on context. In boolean scalar context it matches once, and stores the position in pos. If you execute it a second time, it starts off from where it left.
The background is that you can write
while (/(b*)/g) { ... }
and get a new match for each iteration:
$perl -e '$_="abbbabc"; while (/(b*)/g) { print "($1)\n" }' () (bbb) () (b) () ()
Update: Answer to the second question
That this code produces two replacements for the string of four 'b's remains a puzzle. Why does this appear (this may be my error) that the regex conflates two 'b's rather than all four?
A naiive substitution implementation would loop on s/b*/^/, because it would continue to replace the empty string with ^ forever on.
Perl is a bit more sophisticated: It detects a zero-width match, and before doing a second substition of a zero-width match at the same position it bumps along, and tries in the next position.
So applying s/b*/^/ on abba make these steps:
abba | match zero b's before a ^abba | match zero b's again. Don't substitute here, bump along ^abba | match 'bb' ^a^a | match zero b's ^a^^a | match zero b's, don't substitute but bump along ^a^^a^ | match zero b's, don't substitute but bump along
You can watch it work; I didn't find a way to get the modified string, but at least you can monitor the match positions:
$ perl -le '$_ = "abba"; s/b*/print pos; "^"/eg; print' 0 1 3 4 ^a^^a^
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Greediness of * vs. greediness of +
by JavaFan (Canon) on Sep 08, 2010 at 13:57 UTC |