in reply to Re: Bug with finding all regexp matches
in thread Bug with finding all regexp matches

For
perl -e 'use re "debug"; "01234" =~ /^(.+)(.+)((?:.z?)+)$(?{ print "$1 + $2 $3\n" })(*FAIL)/' 2>&1 | less
for the missing "0 1 234" the trace has
whilem: (cache) already tried at this position... failed...
and my wild guess is that it cached the effect of (*FAIL) and not the real failure of a regexp, hence EVAL is skipped and last result is missed. I didn't find the way to disable the cache with "use re".

Replies are listed 'Best First'.
Re^3: Bug with finding all regexp matches
by Anonymous Monk on Oct 15, 2016 at 18:18 UTC
    To add to the previous post: For
    perl -e 'use re "debug"; "01234" =~ /^(.+?)(.+)((?:.z?)+)$(?{ print "$ +1 $2 $3\n" })(*FAIL)/' 2>&1 | less
    two results are missing and there are two entries
    whilem: (cache) already tried at this position... failed...
    It figures.
      I opened a bug on this, https://rt.perl.org/Ticket/Display.html?id=129886 .
        This isn't a bug, it's intended behaviour :-). The superlinear cache kicks in for various types of complex/nested quantifiers (such as (z?)+) to avoid heat-death-of-the-universe time to find failure. It records pattern/string position combos where a match failed, and if it comes across the same combo again it stops and backtracks immediately at that point rather than continuing with the rest of the pattern (which will inevitably fail later).

        See the very first paragraph in perlre.pod about re_evals:

        B<WARNING>: Using this feature safely requires that you understand its limitations. Code executed that has side effects may not perform identically from version to version due to the effect of future optimisations in the regex engine. For more information on this, see L</Embedded Code Execution Frequency>
        Where that link goes to: Embedded Code Execution Frequency

        Dave.