in reply to Re^2: zero-length match increments pos() (saner)
in thread zero-length match increments pos()

The thought plickens...

I wanted to add some more examples to make sure the point is clear and so needed a handy copy of sed and eventually turned to my Zaurus (since I was in bed) and produced:

$ echo bbbaaabbb | sed -e 's/\(b*\)/(\1)/g' (bbb)()a()a()a(bbb) $

which added a new point on the speculum (ducks1).

I eventually calmed down and convinced myself it was just a quirk of busybox's imitation of sed and found a real copy of sed on FreeBSD and produced:

$ echo bbbaaabbb | sed -e 's/\(b*\)/(\1)/g' (bbb)a()a()a(bbb) $

to compare to Perl:

$ echo bbbaaabbb | perl -pe 's/(b*)/(\1)/g' (bbb)()a()a()a(bbb)() () $ echo bbbaaabbb | perl -lpe 's/(b*)/(\1)/g' > (bbb)()a()a()a(bbb)() $

So we see that the ancient lords of s///g, sed and vi(ex), agree that it doesn't make sense for two successive matches to end at the same point.

We also see how easy it is to overlook this point. The authors of busybox (or the regex library it uses) realized that once you reach the end, you are done, but not that it doesn't make sense for two matches to end at the same point other than at the end: (bbb)()a()a()a(bbb)

So I'm sure Perl6 will need to support Perl5-compatable mode, but it'd be nice if it'd also supported sed / vi / saner mode (and, personally, I'd make that the default mode -- the Perl5 mode has even been accused of being a "bug" right here at PerlMonks more than once, other than by me).

While thinking about this, I also envisioned a fun 'watch me backtrack' mode.

It would be useful for teaching regular expressions. In this mode, matching 'bbaabb' =~ /b*/g would return the following matches in the following order:

[bbaabb] (bb) (b) () (b) () .() ..() ...(bb) ...(b) ...() ....(b) ....() .....() [bbaabb]

while matching 'bbaabb' =~ /b*?/g would return the following matches in the following order:

[bbaabb] () (b) (bb) () (b) .() ..() ...() ...(b) ...(bb) ....() ....(b) .....() [bbaabb]

- tye        

1 That's enough to make a Welsh Harlequin blush.

Replies are listed 'Best First'.
Re^4: zero-length match increments pos() (saner)
by demerphq (Chancellor) on Nov 09, 2006 at 00:39 UTC

    Do we need a mode for this? Getting all the matches? Its possible to do with an embedded code block (as I think you know :-)

    perl -le"$_='bbaabb'; /b*(?{print '.' x $-[0],qq<($&)>})(*FAIL)/g" (bb) (b) () .(b) .() ..() ...() ....(bb) ....(b) ....() .....(b) .....() ......()

    On earlier perls than mine you can spell (*FAIL) as (?!)

    ---
    $world=~s/war/peace/g

Re^4: zero-length match increments pos() (saner)
by hv (Prior) on Feb 23, 2005 at 14:27 UTC

    I also envisioned a fun 'watch me backtrack' mode.

    These are precisely the matches that will be returned by another option, which I think was called ':exhaustive'.

    I expect that option also to be very useful for combinatorial exercises.

    Hugo