ysth has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to optimize a regex using the SKIP and MARK verbs, and not understanding how they should be properly used together. Sadly, the documentation provides no examples, and doesn't seem to rigorously define its terms:

"(*MARK:*NAME*)" "(*:*NAME*)"

This zero-width pattern can be used to mark the point reached in a string when a certain part of the pattern has been successfully matched. This mark may be given a name. A later "(*SKIP)" pattern will then skip forward to that point if backtracked into on failure.
Given that, I'm expecting this:
"aab" =~ /(*SKIP:go)(.)(?!\1(*MARK:go))/

to match the b. Instead, it matches the second a, just as it would if the skip and mark were omitted.

Perhaps iterating over the string to begin the match doesn't count as "backtracked into"? But then I'd expect

"aab"=~/^.*?(*SKIP:go)(.)(?!\1(*MARK:go))/
to capture the b, yet it again captures the second a.

Perhaps subpatterns such as (?!...) don't share marks with the main pattern? The slightly more verbose pcre doc seems to say that. But then this should work:

"aab"=~/(*SKIP:go)(.)(?(?=\1)\1(*MARK:go)(*FAIL)|)/
and it doesn't.

I suspect I'm just misunderstanding something fundamental here.



--
A math joke: r = | |csc(θ)|+|sec(θ)| |-| |csc(θ)|-|sec(θ)| |

Replies are listed 'Best First'.
Re: understanding (*SKIP:...)
by ikegami (Patriarch) on Jun 11, 2025 at 22:05 UTC

    That (*SKIP:name) precedes (*MARK:name) in your pattern makes no sense, and leads me to think you believe a MARK is akin to a label and a SKIP is akin to a goto. That is not the case at all.

    What we're marking and skipping to are positions in the string being matched.

    (*MARK:name) bookmarks the current match position under the provided name.
    (*SKIP:name) prevents everything before the named bookmarked position from being part of the final match if we backtracking through this.
    (*SKIP) prevents everything before the position from being of the final match if we backtracking through this.
    (*SKIP) is basically the same as (*MARK:anon)(*SKIP:anon).

    Solution:

    if ( "aab" =~ /(.)\1(*SKIP)(*FAIL)|.*/ ) { say $&; # b }

    (*SKIP) was matched at position 2. When (*FAIL) caused the matching to backtrack through (*SKIP), everything before position 2 was eliminated from potential matches.

    Non-trivial example using (*MARK:...):

    If instead you want ab, you could use

    if ( "aab" =~ /(.)(*MARK:go)\1(*SKIP:go)(*FAIL)|.*/ ) { say $&; # ab }

    This time, only the text before position 1 was eliminated from potential matches.

      I think I understand everything you are saying, and what I think I wasn't getting is that backtracking through a SKIP immediately terminates the attempt of the regex match at its current position; I wasn't thinking it was a goto, just that somehow I had to do something else to keep the current match attempt from proceeding on other branches.

      I don't think I understand yet why this didn't work, though:

      /(*SKIP:go)(.)(?(?=\1)\1(*MARK:go)(*FAIL)|)/
      Is it simply that the go mark is looked for when the SKIP is encountered, not when it is backtracked through?

        Either that, or the mark is scoped so that backtracking through it forgets it. Or both.

Re: understanding (*SKIP:...)
by ysth (Canon) on Jun 11, 2025 at 21:53 UTC
    This seems to actually work:
    "aab"=~/(.)(?(?=\1)\1(*SKIP)(*FAIL)|)/
    (or a mark followed by a named skip instead of the bare skip) but I'm reluctant to use it without actually feeling like I understand.