Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)

Replies are listed 'Best First'.
Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by Anonymous Monk on Sep 24, 2025 at 22:19 UTC
(now it's not related to the PWC task), w.r.t. "horrible amount of backtracking", I think these 2 patterns `/(.(?1)(F))/ /(.)(F)/` [download] should backtrack about the same amount. Yet one is 4 orders of magnitude slower (with sample sizes and hardware I'm using) and clearly exponential. The other stays linear. Sorry I didn't re-run it with more "REPEATS" and am leaving it untidy, it's just "mad science" for laughs of course i.e. I now see recursive pattern wasn't a good idea to use for the task. 1.4 +-------------------------------------------------+ \| + + + + + + \| \| A\| \| \| \| \| \| A\| \| A \| \| \| \| A \| \| \| \| A \| 1.2 \|-+ +-\| \| A \| \| A \| \| \| \| A \| \| \| \| A \| \| A \| \| \| \| A \| \| \| 1 \|-+ A +-\| \| A \| \| \| \| A \| \| A \| \| \| \| A \| \| A \| \| \| \| A \| \| A \| 0.8 \|-+ +-\| \| A \| \| A \| \| \| \| A \| \| A \| \| \| \| AA \| \| \| \| A \| 0.6 \|-+ A +-\| \| \| \| AA \| \| \| \| A \| \| A \| \| A \| \| \| \| A \| \| A \| \| \| 0.4 \|-+ A +-\| \| A \| \| A \| \| AA \| \| \| \| AA \| \| A \| \| \| \| AAA \| \| A \| \| A \| 0.2 \|-+ A +-\| \| A \| \| A \| \| AAA \| \| AA \| \| A \| \| AA \| \| AA \| \| AAA \| \| AAA \| \|A + + + + + + \| 0 +-------------------------------------------------+ 1000 2000 3000 4000 5000 6000 7000 8000 1.2 +-------------------------------------------------+ \| + + + + + + \| \| \| \| \| \| \| \| A\| \| A\| \| A \| \| A \| \| A \| \| A \| \| A \| \| A \| 1 \|-+ AA +-\| \| A \| \| A \| \| A \| \| A \| \| A \| \| AA \| \| \| \| AA \| \| A \| \| A \| \| A \| 0.8 \|-+ A +-\| \| A \| \| AA \| \| \| \| A \| \| AA \| \| A \| \| A \| \| AA \| \| A \| \| A \| \| A \| \| A \| 0.6 \|-+ A +-\| \| A \| \| A \| \| A \| \| A \| \| A \| \| A \| \| AA \| \| A \| \| A \| \| A \| \| A \| \| A \| 0.4 \|-+ A +-\| \| A \| \| A \| \| A \| \| A \| \| A \| \| A \| \| A \| \| A \| \| A \| \| AA \| \| \| 0.2 \|-+ A +-\| \| A \| \| A \| \|A \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| + + + + + + \| 0 +-------------------------------------------------+ 1e+07 2e+07 3e+07 4e+07 5e+07 6e+07 7e+07 8e+07 [download] #### use strict; use warnings; use Benchmark qw/ timeit :hireswallclock /; use Chart::Gnuplot; use File::Spec::Functions 'catfile'; use File::Basename 'dirname'; my $GP = catfile dirname( $^X ), '../../c/bin/gnuplot.exe'; die unless -x $GP; # assume Strawberry Perl use constant REPEATS => 1; STDOUT-> autoflush( 1 ); { my ( @x, @y, @datasets ); for my $x ( map 1e6 $_, 10 .. 80 ) { print "$x "; my $s = 'a' x $x; my $t = timeit REPEATS, sub { $s =~ /(.)(F)/ }; my $y = $t-> [ 1 ] / $t-> [ -1 ]; push @x, $x; push @y, $y; } print "\n\n"; push @datasets, Chart::Gnuplot::DataSet-> new( xdata => \@x, ydata => \@y, style => 'points', ); my $chart = Chart::Gnuplot-> new( gnuplot => $GP, terminal => 'dumb size 60, 80', ); $chart-> plot2d( @datasets ); } { my ( @x, @y, @datasets ); for my $x ( map 1e2 * $_, 10 .. 80 ) { print "$x "; my $s = 'a' x $x; my $t = timeit REPEATS, sub { $s =~ /(.(?1)(*F))/ }; my $y = $t-> [ 1 ] / $t-> [ -1 ]; push @x, $x; push @y, $y; } print "\n\n"; push @datasets, Chart::Gnuplot::DataSet-> new( xdata => \@x, ydata => \@y, style => 'points', ); my $chart = Chart::Gnuplot-> new( gnuplot => $GP, terminal => 'dumb size 60, 80', ); $chart-> plot2d( @datasets ); } [download]	[reply] [d/l] [select]
Re^2: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by LanX (Saint) on Sep 26, 2025 at 14:39 UTC
I think these 2 patterns /(.(?1)(F))/ /(.)(F)/ should backtrack about the same amount. Nope, seems like the first has quadratic and the second one linear complexity. `use v5.12; use warnings; say my $az = join "","a".."z"; $az x= 100; my $c; my $show=0; sub COUNT { $c++; say "Step $c matches '",$2 // "","'" if $show; } $show=0; for my $l (1..24) { say "=== Length Str: ",$l; my $str = substr $az,0,$l; $c = 0; $str =~ /( (.) (?{COUNT}) (?1) (FAIL))/x; say "$l : $c = ", $l($l+1)/2; $c=0; $str =~ /( (?: (.) (?{COUNT}) ) (FAIL) )/x; say "$l : $c = ", 2 $l-1; }` [download] output # perl /home/lanx/perl/pm/recursive_regex.pl abcdefghijklmnopqrstuvwxyz === Length Str: 1 1 : 0 = 1 1 : 1 = 1 === Length Str: 2 2 : 3 = 3 2 : 3 = 3 === Length Str: 3 3 : 6 = 6 3 : 5 = 5 === Length Str: 4 4 : 10 = 10 4 : 7 = 7 === Length Str: 5 5 : 15 = 15 5 : 9 = 9 === Length Str: 6 6 : 21 = 21 6 : 11 = 11 === Length Str: 7 7 : 28 = 28 7 : 13 = 13 === Length Str: 8 8 : 36 = 36 8 : 15 = 15 === Length Str: 9 9 : 45 = 45 9 : 17 = 17 === Length Str: 10 10 : 55 = 55 10 : 19 = 19 === Length Str: 11 11 : 66 = 66 11 : 21 = 21 === Length Str: 12 12 : 78 = 78 12 : 23 = 23 === Length Str: 13 13 : 91 = 91 13 : 25 = 25 === Length Str: 14 14 : 105 = 105 14 : 27 = 27 === Length Str: 15 15 : 120 = 120 15 : 29 = 29 === Length Str: 16 16 : 136 = 136 16 : 31 = 31 === Length Str: 17 17 : 153 = 153 17 : 33 = 33 === Length Str: 18 18 : 171 = 171 18 : 35 = 35 === Length Str: 19 19 : 190 = 190 19 : 37 = 37 === Length Str: 20 20 : 210 = 210 20 : 39 = 39 === Length Str: 21 21 : 231 = 231 21 : 41 = 41 === Length Str: 22 22 : 253 = 253 22 : 43 = 43 === Length Str: 23 23 : 276 = 276 23 : 45 = 45 === Length Str: 24 24 : 300 = 300 24 : 47 = 47 [download] Reason: Perl regex are highly pre-optimized by using heuristics. HTH - LanX (Log In Timed out)	[reply] [d/l] [select]
Re^3: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by LanX (Saint) on Sep 26, 2025 at 22:55 UTC
> Reason: Perl regex are highly pre-optimized by using heuristics. Actually it's a post-optimization. After the first full backtrack (from "a" to end/FAIL) the 2nd regex reports "Detected a super-linear match, switching on caching..." and runs from "b" to FAIL. After that the cached results are later used to see that no match is possible and backtracking is stopped. That's why the formula is `n+(n-1)` So it's not that the 1st regex with the recursion is very bad�, it's just that the second is very clever. Hint: `use re "debug"` to see the details. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery} �) there still might be a memory leak somewhere, but who is using recursion to this extremes anyway?	[reply] [d/l] [select]
Re^4: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by LanX (Saint) on Sep 28, 2025 at 17:11 UTC
Re^2: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by LanX (Saint) on Sep 24, 2025 at 23:09 UTC
I could think of various reasons: recursive patterns are not optimized for going deep but rather meant for parsing structures like code or data with at best hundreds of nesting levels like with subs every recursion means storing a certain amount of data on the stack branching doesn't happen that often when parsing a grammar, bc there is only a small amount of opening "brackets" to follow deeper "normal" regexes are optimized by heuristics to take shortcuts, i.e. `.` is not naively implemented going step by step My suggestion would be that you `use` `re` `'debug'` to see whats happening. Especially I'd try to make sure that it's really the same amount of backtracking in both cases update FWIW: my OS is killing the process when I attempt to have 1e7 recursions. `DB<3> ("a"x 1e6) =~ / (. (?: (?1) \| ) ) /x; say length $1 1000000 DB<4> ("a"x 1e7) =~ /(.)/; say length $1 10000000 DB<5> ("a"x 1e7) =~ / (. (?: (?1) \| ) ) /x; say length $1 Killed` [download] My guess: memory problems on the stack. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by LanX (Saint) on Sep 24, 2025 at 14:15 UTC
UPDATE: see Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) the following is no full solution this `$str =~ s/((\w)(?1)?\g{-1})//g;` does the trick without while loop, you need a relative `\g{-1}` reference when doing recursions. for extra speed I'll do `$str =~ s/((\w)(?1)?\g{-1}+)//g;` `# https://theweeklychallenge.org/blog/perl-weekly-challenge-340/ use v5.12; use warnings; use Test::More; for my $case ('abbaca','azxxzy','aaaaaaaa','aabccba','abcddcba') { my $str = $case; say "* Input: $str"; while ( $str =~s/(\w)\1// ) { say "Remove '$&' => '$str'"; } my $exp = $str; say "Expected Output : '$str'"; $str = $case; $str =~ s/((\w)(?1)?\g{-1})//g; say "Got Output : '$str'"; is($exp,$str,"'$case' ->'$str'"); say "...\n" } done_testing()` [download] * Input: abbaca Remove 'bb' => 'aaca' Remove 'aa' => 'ca' Expected Output : 'ca' Got Output : 'ca' ok 1 - 'abbaca' ->'ca' ... * Input: azxxzy Remove 'xx' => 'azzy' Remove 'zz' => 'ay' Expected Output : 'ay' Got Output : 'ay' ok 2 - 'azxxzy' ->'ay' ... * Input: aaaaaaaa Remove 'aa' => 'aaaaaa' Remove 'aa' => 'aaaa' Remove 'aa' => 'aa' Remove 'aa' => '' Expected Output : '' Got Output : '' ok 3 - 'aaaaaaaa' ->'' ... * Input: aabccba Remove 'aa' => 'bccba' Remove 'cc' => 'bba' Remove 'bb' => 'a' Expected Output : 'a' Got Output : 'a' ok 4 - 'aabccba' ->'a' ... * Input: abcddcba Remove 'dd' => 'abccba' Remove 'cc' => 'abba' Remove 'bb' => 'aa' Remove 'aa' => '' Expected Output : '' Got Output : '' ok 5 - 'abcddcba' ->'' ... 1..5 [download] Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by LanX (Saint) on Sep 24, 2025 at 13:14 UTC
Your "recursive subpattern" approach still needs an outer while loop, so I don't really see the point, apart of complicating it. (I think /g stops if there is either no match or pos hits the end. To avoid an outer while-loop you might need to anchor a pattern at the start without changing `pos` , like inside a look-ahead ) FWIW: This would be my approach: `# https://theweeklychallenge.org/blog/perl-weekly-challenge-340/ use v5.12; use warnings; for ('abbaca','azxxzy','aaaaaaaa','aabccba','abcddcba') { my $str = $_; say "* Input: $str"; while ($str =~ s/(\w)\1+//g) { say "Remove '$&' => '$str'" } ; say "Output: '$str'\n\n"; }` [download] It's producing the same "Output" while skipping some steps. (for exactly the same intermediate steps like in in the weekly challenge remove /g and +) `* Input: abbaca Remove 'bb' => 'aaca' Remove 'aa' => 'ca' Output: 'ca' * Input: azxxzy Remove 'xx' => 'azzy' Remove 'zz' => 'ay' Output: 'ay' * Input: aaaaaaaa Remove 'aaaaaaaa' => '' Output: '' * Input: aabccba Remove 'cc' => 'bba' Remove 'bb' => 'a' Output: 'a' * Input: abcddcba Remove 'dd' => 'abccba' Remove 'cc' => 'abba' Remove 'bb' => 'aa' Remove 'aa' => '' Output: ''` [download] Tho I can't guaranty that there isn't a case where only replacing 2 instead of 2 or more leads to different results. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?) by LanX (Saint) on Sep 24, 2025 at 21:15 UTC
Turns out `\2` or `\g{-1}` is not the point here, while the latter is far more stable when refactoring. Consider this input: `"abbcca"` Your recursive approach without an outer loop will produce `"aa"` while the solution is `""` This might (rather not) be a solution without loops. `$s =~ s/ ( ( (.) \g{-1} )+ \| (.) (?1) \g{-1} ) //xg;` [download] Frankly I don't know, it's only passing all the tests so far. Regarding your performance question: the recursive search does a horrible amount of backtracking trying to find a match. Because every single character allows you to descend further and further, and with longer strings it'll take a while till you mostly don't find the counterpart, and have to track back. Try to insert various `(?{perl snippets})` in your regex and you'll see how many useless branches are searched. At the same time the original algorithm just needs to compare to neighboring letters, and is hence far faster. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply] [d/l] [select]

(now it's not related to the PWC task), w.r.t. "horrible amount of backtracking", I think these 2 patterns

/(.(?1)(*F))/
/(.*)(*F)/
[download]

should backtrack about the same amount. Yet one is 4 orders of magnitude slower (with sample sizes and hardware I'm using) and clearly exponential. The other stays linear. Sorry I didn't re-run it with more "REPEATS" and am leaving it untidy, it's just "mad science" for laughs of course i.e. I now see recursive pattern wasn't a good idea to use for the task.

  1.4 +-------------------------------------------------+
      |      +      +      +       +      +      +      |
      |                                                A|
      |                                                 |
      |                                                 |
      |                                                A|
      |                                               A |
      |                                                 |
      |                                              A  |
      |                                                 |
      |                                             A   |
  1.2 |-+                                             +-|
      |                                             A   |
      |                                            A    |
      |                                                 |
      |                                           A     |
      |                                                 |
      |                                           A     |
      |                                          A      |
      |                                                 |
      |                                         A       |
      |                                                 |
    1 |-+                                      A      +-|
      |                                        A        |
      |                                                 |
      |                                       A         |
      |                                      A          |
      |                                                 |
      |                                      A          |
      |                                     A           |
      |                                                 |
      |                                    A            |
      |                                   A             |
  0.8 |-+                                             +-|
      |                                   A             |
      |                                  A              |
      |                                                 |
      |                                 A               |
      |                                 A               |
      |                                                 |
      |                               AA                |
      |                                                 |
      |                              A                  |
  0.6 |-+                            A                +-|
      |                                                 |
      |                            AA                   |
      |                                                 |
      |                            A                    |
      |                           A                     |
      |                          A                      |
      |                                                 |
      |                         A                       |
      |                        A                        |
      |                                                 |
  0.4 |-+                     A                       +-|
      |                       A                         |
      |                      A                          |
      |                    AA                           |
      |                                                 |
      |                   AA                            |
      |                  A                              |
      |                                                 |
      |                AAA                              |
      |               A                                 |
      |               A                                 |
  0.2 |-+            A                                +-|
      |             A                                   |
      |             A                                   |
      |          AAA                                    |
      |         AA                                      |
      |        A                                        |
      |       AA                                        |
      |     AA                                          |
      |   AAA                                           |
      | AAA                                             |
      |A     +      +      +       +      +      +      |
    0 +-------------------------------------------------+
     1000   2000   3000   4000    5000   6000   7000   8000

  1.2 +-------------------------------------------------+
      |      +      +      +       +      +      +      |
      |                                                 |
      |                                                 |
      |                                                 |
      |                                                A|
      |                                                A|
      |                                               A |
      |                                              A  |
      |                                             A   |
      |                                             A   |
      |                                            A    |
      |                                           A     |
    1 |-+                                        AA   +-|
      |                                         A       |
      |                                        A        |
      |                                        A        |
      |                                       A         |
      |                                      A          |
      |                                     AA          |
      |                                                 |
      |                                   AA            |
      |                                   A             |
      |                                  A              |
      |                                 A               |
  0.8 |-+                               A             +-|
      |                                A                |
      |                              AA                 |
      |                                                 |
      |                              A                  |
      |                            AA                   |
      |                            A                    |
      |                           A                     |
      |                         AA                      |
      |                         A                       |
      |                        A                        |
      |                       A                         |
      |                       A                         |
  0.6 |-+                    A                        +-|
      |                     A                           |
      |                    A                            |
      |                    A                            |
      |                   A                             |
      |                  A                              |
      |                  A                              |
      |                AA                               |
      |               A                                 |
      |               A                                 |
      |              A                                  |
      |             A                                   |
      |             A                                   |
  0.4 |-+          A                                  +-|
      |           A                                     |
      |          A                                      |
      |         A                                       |
      |        A                                        |
      |        A                                        |
      |       A                                         |
      |      A                                          |
      |     A                                           |
      |     A                                           |
      |   AA                                            |
      |                                                 |
  0.2 |-+ A                                           +-|
      |  A                                              |
      | A                                               |
      |A                                                |
      |                                                 |
      |                                                 |
      |                                                 |
      |                                                 |
      |                                                 |
      |                                                 |
      |                                                 |
      |                                                 |
      |      +      +      +       +      +      +      |
    0 +-------------------------------------------------+
    1e+07  2e+07  3e+07  4e+07   5e+07  6e+07  7e+07  8e+07
[download]

####

use strict;
use warnings;

use Benchmark qw/ timeit :hireswallclock /;
use Chart::Gnuplot;
use File::Spec::Functions 'catfile';
use File::Basename 'dirname';

my $GP = catfile dirname( $^X ), '../../c/bin/gnuplot.exe';
die unless -x $GP;              # assume Strawberry Perl

use constant REPEATS => 1;

STDOUT-> autoflush( 1 );

{
    my ( @x, @y, @datasets );
    for my $x ( map 1e6 * $_, 10 .. 80 ) {
        print "$x ";
        my $s = 'a' x $x;
        my $t = timeit REPEATS, sub { $s =~ /(.*)(*F)/ };
        my $y = $t-> [ 1 ] / $t-> [ -1 ];
        push @x, $x;
        push @y, $y;
    }
    print "\n\n";
    push @datasets, Chart::Gnuplot::DataSet-> new(
        xdata => \@x,
        ydata => \@y,
        style => 'points',
    );
    my $chart = Chart::Gnuplot-> new( 
        gnuplot  => $GP,
        terminal => 'dumb size 60, 80',
    );
    $chart-> plot2d( @datasets );
}
{
    my ( @x, @y, @datasets );
    for my $x ( map 1e2 * $_, 10 .. 80 ) {
        print "$x ";
        my $s = 'a' x $x;
        my $t = timeit REPEATS, sub { $s =~ /(.(?1)(*F))/ };
        my $y = $t-> [ 1 ] / $t-> [ -1 ];
        push @x, $x;
        push @y, $y;
    }
    print "\n\n";
    push @datasets, Chart::Gnuplot::DataSet-> new(
        xdata => \@x,
        ydata => \@y,
        style => 'points',
    );
    my $chart = Chart::Gnuplot-> new( 
        gnuplot  => $GP,
        terminal => 'dumb size 60, 80',
    );
    $chart-> plot2d( @datasets );
}
[download]

[reply]
[d/l]
[select]

I think these 2 patterns
/(.(?1)(*F))/ /(.*)(*F)/
should backtrack about the same amount.

use v5.12;
use warnings;


say my $az = join "","a".."z";
$az x= 100;

my $c;
my $show=0;
sub COUNT {
  $c++;
  say "Step $c matches '",$2 // "","'"
    if $show;
}
$show=0;
for my $l (1..24)

  {
  say "=== Length Str: ",$l;
  my $str = substr $az,0,$l;
  $c = 0;
  $str =~ /( (.) (?{COUNT}) (?1) (*FAIL))/x; 
  say "$l : $c = ", $l*($l+1)/2;
  $c=0;
  $str =~ /( (?: (.) (?{COUNT}) )* (*FAIL) )/x;
  say "$l : $c = ", 2 *$l-1;
}
[download]

# perl /home/lanx/perl/pm/recursive_regex.pl
abcdefghijklmnopqrstuvwxyz
=== Length Str: 1
1 : 0 = 1
1 : 1 = 1
=== Length Str: 2
2 : 3 = 3
2 : 3 = 3
=== Length Str: 3
3 : 6 = 6
3 : 5 = 5
=== Length Str: 4
4 : 10 = 10
4 : 7 = 7
=== Length Str: 5
5 : 15 = 15
5 : 9 = 9
=== Length Str: 6
6 : 21 = 21
6 : 11 = 11
=== Length Str: 7
7 : 28 = 28
7 : 13 = 13
=== Length Str: 8
8 : 36 = 36
8 : 15 = 15
=== Length Str: 9
9 : 45 = 45
9 : 17 = 17
=== Length Str: 10
10 : 55 = 55
10 : 19 = 19
=== Length Str: 11
11 : 66 = 66
11 : 21 = 21
=== Length Str: 12
12 : 78 = 78
12 : 23 = 23
=== Length Str: 13
13 : 91 = 91
13 : 25 = 25
=== Length Str: 14
14 : 105 = 105
14 : 27 = 27
=== Length Str: 15
15 : 120 = 120
15 : 29 = 29
=== Length Str: 16
16 : 136 = 136
16 : 31 = 31
=== Length Str: 17
17 : 153 = 153
17 : 33 = 33
=== Length Str: 18
18 : 171 = 171
18 : 35 = 35
=== Length Str: 19
19 : 190 = 190
19 : 37 = 37
=== Length Str: 20
20 : 210 = 210
20 : 39 = 39
=== Length Str: 21
21 : 231 = 231
21 : 41 = 41
=== Length Str: 22
22 : 253 = 253
22 : 43 = 43
=== Length Str: 23
23 : 276 = 276
23 : 45 = 45
=== Length Str: 24
24 : 300 = 300
24 : 47 = 47
[download]

HTH

- LanX (Log In Timed out)

[reply]
[d/l]
[select]

> Reason: Perl regex are highly pre-optimized by using heuristics.

Actually it's a post-optimization.

After the first full backtrack (from "a" to end/FAIL) the 2nd regex reports

"Detected a super-linear match, switching on caching..."

and runs from "b" to FAIL.

After that the cached results are later used to see that no match is possible and backtracking is stopped. That's why the formula is n+(n-1)

So it's not that the 1st regex with the recursion is very bad�, it's just that the second is very clever.

Hint: use re "debug" to see the details.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

�) there still might be a memory leak somewhere, but who is using recursion to this extremes anyway?

[reply]
[d/l]
[select]

recursive patterns are not optimized for going deep
but rather meant for parsing structures like code or data with at best hundreds of nesting levels
like with subs every recursion means storing a certain amount of data on the stack
branching doesn't happen that often when parsing a grammar, bc there is only a small amount of opening "brackets" to follow deeper
"normal" regexes are optimized by heuristics to take shortcuts, i.e. .* is not naively implemented going step by step

use

re

'debug'

Especially I'd try to make sure that it's really the same amount of backtracking in both cases

update

FWIW: my OS is killing the process when I attempt to have 1e7 recursions.

  DB<3> ("a"x 1e6) =~ / (. (?: (?1) | ) ) /x; say length $1
1000000

  DB<4> ("a"x 1e7) =~ /(.*)/; say length $1
10000000

  DB<5> ("a"x 1e7) =~ / (. (?: (?1) | ) ) /x; say length $1
Killed
[download]

My guess: memory problems on the stack.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

[reply]
[d/l]
[select]

Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)

this

$str =~ s/((\w)(?1)?\g{-1})//g;

does the trick without while loop, you need a relative \g{-1} reference when doing recursions.

for extra speed I'll do

$str =~ s/((\w)(?1)?\g{-1}+)//g;

#  https://theweeklychallenge.org/blog/perl-weekly-challenge-340/
use v5.12;
use warnings;
use Test::More;

for my $case ('abbaca','azxxzy','aaaaaaaa','aabccba','abcddcba') {

  my $str = $case;
  say "*** Input: $str";
  while ( $str =~s/(\w)\1// ) {
    say "Remove '$&' => '$str'";
  } 
  my $exp = $str;
  say "Expected Output : '$str'";

  $str = $case;
  $str =~ s/((\w)(?1)?\g{-1})//g;
  say "Got Output : '$str'";
  is($exp,$str,"'$case' ->'$str'");

  say "...\n"
}

done_testing()
[download]

*** Input: abbaca Remove 'bb' => 'aaca' Remove 'aa' => 'ca' Expected Output : 'ca' Got Output : 'ca' ok 1 - 'abbaca' ->'ca' ... *** Input: azxxzy Remove 'xx' => 'azzy' Remove 'zz' => 'ay' Expected Output : 'ay' Got Output : 'ay' ok 2 - 'azxxzy' ->'ay' ... *** Input: aaaaaaaa Remove 'aa' => 'aaaaaa' Remove 'aa' => 'aaaa' Remove 'aa' => 'aa' Remove 'aa' => '' Expected Output : '' Got Output : '' ok 3 - 'aaaaaaaa' ->'' ... *** Input: aabccba Remove 'aa' => 'bccba' Remove 'cc' => 'bba' Remove 'bb' => 'a' Expected Output : 'a' Got Output : 'a' ok 4 - 'aabccba' ->'a' ... *** Input: abcddcba Remove 'dd' => 'abccba' Remove 'cc' => 'abba' Remove 'bb' => 'aa' Remove 'aa' => '' Expected Output : '' Got Output : '' ok 5 - 'abcddcba' ->'' ... 1..5
[download]

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

[reply]
[d/l]
[select]

(I think /g stops if there is either no match or pos hits the end. To avoid an outer while-loop you might need to anchor a pattern at the start without changing pos , like inside a look-ahead )

FWIW: This would be my approach:

#  https://theweeklychallenge.org/blog/perl-weekly-challenge-340/
use v5.12;
use warnings;

for ('abbaca','azxxzy','aaaaaaaa','aabccba','abcddcba') {
  my $str = $_;
  say "*** Input: $str";
  while ($str =~ s/(\w)\1+//g) {
    say "Remove '$&' => '$str'"
  } ;
  say "Output: '$str'\n\n";
}
[download]

It's producing the same "Output" while skipping some steps.

(for exactly the same intermediate steps like in in the weekly challenge remove /g and +)

*** Input: abbaca Remove 'bb' => 'aaca' Remove 'aa' => 'ca' Output: 'ca' *** Input: azxxzy Remove 'xx' => 'azzy' Remove 'zz' => 'ay' Output: 'ay' *** Input: aaaaaaaa Remove 'aaaaaaaa' => '' Output: '' *** Input: aabccba Remove 'cc' => 'bba' Remove 'bb' => 'a' Output: 'a' *** Input: abcddcba Remove 'dd' => 'abccba' Remove 'cc' => 'abba' Remove 'bb' => 'aa' Remove 'aa' => '' Output: ''
[download]

Tho I can't guaranty that there isn't a case where only replacing 2 instead of 2 or more leads to different results.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

[reply]
[d/l]
[select]

\2

\g{-1}

Consider this input: "abbcca"

Your recursive approach without an outer loop will produce "aa" while the solution is ""

This might (rather not) be a solution without loops.

  $s =~ s/
       (
         (
           (.) \g{-1} 
         )+
       |
           (.) (?1) \g{-1}
       )
     //xg;
[download]

Frankly I don't know, it's only passing all the tests so far.

Regarding your performance question: the recursive search does a horrible amount of backtracking trying to find a match.

Because every single character allows you to descend further and further, and with longer strings it'll take a while till you mostly don't find the counterpart, and have to track back.

Try to insert various (?{perl snippets}) in your regex and you'll see how many useless branches are searched.

At the same time the original algorithm just needs to compare to neighboring letters, and is hence far faster.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

[reply]
[d/l]
[select]