What it does is enforce that two zero length matches can't begin at the same point.

Ah. It took me a bit to figure out the crux of the difference you were pointing out so I highlighted it above. I had indeed missed that point and thank you for pointing it out.

But it does more than just prevent repeats. It also enforces that the next match must start after the end of the previous match. Zero-width matches make this rule insufficient. And I find that it makes more sense to extend this idea differently than Perl does.

If we follow the useful STL convention and define "end(N)" (the end of the Nth match) to be the character after the last character in the match (so that begin(N) <= end(N) and we don't have to try to talk about the spaces between characters), then the common-sense rule boils down to end(N) <= begin(N+1).

My proposal is that the above rule be extended as:

begin(N) <= end(N) <= begin(N+1) <= end(N+1) begin(N) < begin(N+1) end(N) < end(N+1)

While what Perl5 does can't be expressed (that I can see) with such rules. Perhaps something like:

begin(N) <= end(N) <= begin(N+1) <= end(N+1) skip N+1 if begin(N)==end(N)==begin(N+1)==end(N+1)

Which is nice in one regard because it provides the maximum number of matches possible while obeying the first rule and not allowing repeats. But I don't think it is the best choice (considering "least surprise", for example).

If \w?? matches an empty string rather than "a" (because it prefers the shorter match, it being anti-greedy), then I don't expect it to go on to match "a" next; it already made its decision regarding "a" and should move on to the next decision point. My expectation is that begin(N) < begin(N+1).

- tye        


In reply to Re^3: zero-length match increments pos() (two!) by tye
in thread zero-length match increments pos() by Errto

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.