Hello tj999, and welcome to the Monastery!

Just to elaborate on tybalt89’s ++answer: the key here is the use of a positive lookahead assertion. Like other lookaround assertions, this is zero-width, so it is ignored when the regex engine is working out where to start looking for the next match during a global search (i.e., when the regex is in list context and has a /g modifier). Here are some references on lookahead assertions:

BTW, note that tybalt89’s solution omits the final comma from the regex. With the comma included, your regex will not match the 999 in a string such as 123,222,456,222,222,111,222,999.

Update 1: An illustration may make things clearer. Say your search string is "222,222,111", and the regex is /222,(\d\d\d)/g. The regex engine begins its search at the first character:

222,222,111 ^ ======= <-- 1st match: 222,222 Capture: 222

and finds a match. Then the search for the next match begins at the character immediately following the end of the previous match:

222,222,111 ^

Not finding a match here, it moves forward one character:

222,222,111 ^

and finds no match; and so on, one character at a time, to the end of the string.

But if the regex has a lookahead assertion, /222,(?=(\d\d\d))/g, the search for a second match again begins one character beyond the end of the previous match, but this time the lookahead assertion itself is not counted as a part of that match, so the regex engine starts looking here:

222,222,111 ^ ======= <-- 2nd match: 222,111 Capture: 111

— and finds the second match. Note that the lookahead assertion has actually effectively1 shifted the regex search back by 3 characters, not 4 as implied by the title of this thread: a small point, but perhaps useful in helping to clarify what is going on.

Updates 2 & 3: Re-wrote Update 1 to fix various errors.

1Update 4: As AnomalousMonk notes, it would be more accurate “to say that a zero-width assertion does not move the search position at all, even if it captures.”

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re: Shift regex search back 4 characters... by Athanasius
in thread Shift regex search back 4 characters... by tj999

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.