Re^2: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)

I think these 2 patterns
/(.(?1)(*F))/ /(.*)(*F)/
should backtrack about the same amount.

Nope, seems like the first has quadratic and the second one linear complexity.

use v5.12;
use warnings;


say my $az = join "","a".."z";
$az x= 100;

my $c;
my $show=0;
sub COUNT {
  $c++;
  say "Step $c matches '",$2 // "","'"
    if $show;
}
$show=0;
for my $l (1..24)

  {
  say "=== Length Str: ",$l;
  my $str = substr $az,0,$l;
  $c = 0;
  $str =~ /( (.) (?{COUNT}) (?1) (*FAIL))/x; 
  say "$l : $c = ", $l*($l+1)/2;
  $c=0;
  $str =~ /( (?: (.) (?{COUNT}) )* (*FAIL) )/x;
  say "$l : $c = ", 2 *$l-1;
}
[download]

output

# perl /home/lanx/perl/pm/recursive_regex.pl
abcdefghijklmnopqrstuvwxyz
=== Length Str: 1
1 : 0 = 1
1 : 1 = 1
=== Length Str: 2
2 : 3 = 3
2 : 3 = 3
=== Length Str: 3
3 : 6 = 6
3 : 5 = 5
=== Length Str: 4
4 : 10 = 10
4 : 7 = 7
=== Length Str: 5
5 : 15 = 15
5 : 9 = 9
=== Length Str: 6
6 : 21 = 21
6 : 11 = 11
=== Length Str: 7
7 : 28 = 28
7 : 13 = 13
=== Length Str: 8
8 : 36 = 36
8 : 15 = 15
=== Length Str: 9
9 : 45 = 45
9 : 17 = 17
=== Length Str: 10
10 : 55 = 55
10 : 19 = 19
=== Length Str: 11
11 : 66 = 66
11 : 21 = 21
=== Length Str: 12
12 : 78 = 78
12 : 23 = 23
=== Length Str: 13
13 : 91 = 91
13 : 25 = 25
=== Length Str: 14
14 : 105 = 105
14 : 27 = 27
=== Length Str: 15
15 : 120 = 120
15 : 29 = 29
=== Length Str: 16
16 : 136 = 136
16 : 31 = 31
=== Length Str: 17
17 : 153 = 153
17 : 33 = 33
=== Length Str: 18
18 : 171 = 171
18 : 35 = 35
=== Length Str: 19
19 : 190 = 190
19 : 37 = 37
=== Length Str: 20
20 : 210 = 210
20 : 39 = 39
=== Length Str: 21
21 : 231 = 231
21 : 41 = 41
=== Length Str: 22
22 : 253 = 253
22 : 43 = 43
=== Length Str: 23
23 : 276 = 276
23 : 45 = 45
=== Length Str: 24
24 : 300 = 300
24 : 47 = 47
[download]

Reason: Perl regex are highly pre-optimized by using heuristics.

HTH

- LanX (Log In Timed out)

Comment on Re^2: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) Select or Download Code

Replies are listed 'Best First'.
Re^3: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by LanX (Saint) on Sep 26, 2025 at 22:55 UTC
> Reason: Perl regex are highly pre-optimized by using heuristics. Actually it's a post-optimization. After the first full backtrack (from "a" to end/FAIL) the 2nd regex reports "Detected a super-linear match, switching on caching..." and runs from "b" to FAIL. After that the cached results are later used to see that no match is possible and backtracking is stopped. That's why the formula is `n+(n-1)` So it's not that the 1st regex with the recursion is very badš, it's just that the second is very clever. Hint: `use re "debug"` to see the details. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery} š) there still might be a memory leak somewhere, but who is using recursion to this extremes anyway?	[reply] [d/l] [select]
Re^4: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by LanX (Saint) on Sep 28, 2025 at 17:11 UTC
I was curious what a "detected a super-linear match" means and asked Duck-AI. Explanation below. (LLMs are good in condensing knowledge as long as you review the output like I did. see also https://stackoverflow.com/questions/12841970/how-can-i-recognize-an-evil-regex) <Reveal this spoiler or all in this thread> Basically its trying to detect that the complexity is more ("super") than linear and turns on caching. Since the situation from starting from "c" without prior match is the same like with a prior "b" or "ab" the cached result of a FAIL can be used to shortcut the search when starting from "c", "d" and so on. Of course you could also apply the same mechanism with recursive regexes, but as I said the use cases are rare. Cheers - LanX (LogIn timing out again)	[reply] [d/l] [select]