When this second item has come up before, it has been defended as being the correct behavior. The more general case is that when a regex can match a zero-width string, it is possible for multiple matches to end at the same point.

Another example is:

$str= "ababa"; $str =~ s/a*/x/g; print "$str\n"

which produces

xxbxxbxx

This is because we start at position 0 and match "a", leaving us a position 1. At position 1 we match "", leaving us at position 2 (we've already started at position 1 so we don't start there again, even though our match ended at position 1). At pos 2 we match "a", at pos 3 we match "", etc.

But this is a bit counter intuative. In fact, sed doesn't have this "quirk". So it might be a good idea to disallow zero-width matches that start (and therefore end) at the point where the previous match ended.

But that raises the ugly spectre of backward compatability... My current feeling is that "we" should "fix" this but provide a way to get the old behavior to ease the burdon of backward compatability (though no suitable syntax/feature for doing that springs to mind). I suspect a lack of to-its will cause the current behavior to remain until someone feels strong enough about it to champion its cause.

        - tye (but my friends call me "Tye")

In reply to RE: Re: ^x* vs x*$ by tye
in thread ^x* vs x*$ by Carl-Joseph

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.