in reply to Re^3: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)
in thread Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)

I was curious what a "detected a super-linear match" means and asked Duck-AI. Explanation below. (LLMs are good in condensing knowledge as long as you review the output like I did. see also https://stackoverflow.com/questions/12841970/how-can-i-recognize-an-evil-regex)

Super-linear Match A super-linear match in the context of regular expressions refers to a situation where the time complexity of matching a pattern against a string exceeds linear time, often approaching quadratic or worse. This typically occurs due to certain patterns in the regular expression that can lead to excessive backtracking.

Characteristics of Super-Linear Matches

Example

Consider the regex pattern ^(a|b)*c$ and the input string aaaaaaab. The regex engine may try to match the string by considering all possible combinations of a and b, leading to a significant increase in processing time.

Conclusion

To avoid super-linear matches, it's essential to design regular expressions carefully, minimizing the use of nested quantifiers and ambiguous patterns. This can help ensure that the matching process remains efficient and operates within linear time complexity.

Basically its trying to detect that the complexity is more ("super") than linear and turns on caching.

Since the situation from starting from "c" without prior match is the same like with a prior "b" or "ab" the cached result of a FAIL can be used to shortcut the search when starting from "c", "d" and so on.

Of course you could also apply the same mechanism with recursive regexes, but as I said the use cases are rare.

Cheers

- LanX (LogIn timing out again)

  • Comment on Re^4: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)
  • Select or Download Code