Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
PWC #340 (current) task #1 is to delete pairs of duplicate adjacent letters until none are left. I thought "maybe it's a good place to use a recursive pattern instead of (possibly) many loop iterations" Did I write it wrong? Is it just not applicable for tasks like these?
use strict; use warnings; use Time::HiRes 'time'; my $str; $str .= chr 97 + rand 2 for 1 .. 5e3; { my $n = 0; my $s = $str; my $t = time; $n ++ while $s =~ s/((.)(?1)?\2)//g; printf qq(%3d loops, %.3f s, result: "%s"\n), $n, time - $t, $s } { my $n = 0; my $s = $str; my $t = time; $n ++ while $s =~ s/(.)\1//g; printf qq(%3d loops, %.3f s, result: "%s"\n), $n, time - $t, $s } # 4 loops, 1.542 s, result: "babababa" # 47 loops, 0.001 s, result: "babababa"
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)
by Anonymous Monk on Sep 24, 2025 at 22:19 UTC | |
(now it's not related to the PWC task), w.r.t. "horrible amount of backtracking", I think these 2 patterns
should backtrack about the same amount. Yet one is 4 orders of magnitude slower (with sample sizes and hardware I'm using) and clearly exponential. The other stays linear. Sorry I didn't re-run it with more "REPEATS" and am leaving it untidy, it's just "mad science" for laughs of course i.e. I now see recursive pattern wasn't a good idea to use for the task.
####
| [reply] [d/l] [select] |
by LanX (Saint) on Sep 26, 2025 at 14:39 UTC | |
I think these 2 patternsNope, seems like the first has quadratic and the second one linear complexity. output Reason: Perl regex are highly pre-optimized by using heuristics. HTH - LanX (Log In Timed out) | [reply] [d/l] [select] |
by LanX (Saint) on Sep 26, 2025 at 22:55 UTC | |
Actually it's a post-optimization. After the first full backtrack (from "a" to end/FAIL) the 2nd regex reports
and runs from "b" to FAIL. After that the cached results are later used to see that no match is possible and backtracking is stopped. That's why the formula is n+(n-1) So it's not that the 1st regex with the recursion is very bad¹, it's just that the second is very clever. Hint: use re "debug" to see the details.
Cheers Rolf
¹) there still might be a memory leak somewhere, but who is using recursion to this extremes anyway? | [reply] [d/l] [select] |
by LanX (Saint) on Sep 28, 2025 at 17:11 UTC | |
by LanX (Saint) on Sep 24, 2025 at 23:09 UTC | |
Especially I'd try to make sure that it's really the same amount of backtracking in both cases
updateFWIW: my OS is killing the process when I attempt to have 1e7 recursions.
My guess: memory problems on the stack.
Cheers Rolf
| [reply] [d/l] [select] |
Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)
by LanX (Saint) on Sep 24, 2025 at 14:15 UTC | |
this
does the trick without while loop, you need a relative \g{-1} reference when doing recursions. for extra speed I'll do
Cheers Rolf
| [reply] [d/l] [select] |
Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)
by LanX (Saint) on Sep 24, 2025 at 13:14 UTC | |
(I think /g stops if there is either no match or pos hits the end. To avoid an outer while-loop you might need to anchor a pattern at the start without changing pos , like inside a look-ahead ) FWIW: This would be my approach:
It's producing the same "Output" while skipping some steps. (for exactly the same intermediate steps like in in the weekly challenge remove /g and +)
Tho I can't guaranty that there isn't a case where only replacing 2 instead of 2 or more leads to different results.
Cheers Rolf
| [reply] [d/l] [select] |
Re: Recursive sub-pattern spectacularly slow, what's wrong? (me, or just this use case, or a bug?)
by LanX (Saint) on Sep 24, 2025 at 21:15 UTC | |
Consider this input: "abbcca" Your recursive approach without an outer loop will produce "aa" while the solution is "" This might (rather not) be a solution without loops.
Frankly I don't know, it's only passing all the tests so far. Regarding your performance question: the recursive search does a horrible amount of backtracking trying to find a match. Because every single character allows you to descend further and further, and with longer strings it'll take a while till you mostly don't find the counterpart, and have to track back. Try to insert various (?{perl snippets}) in your regex and you'll see how many useless branches are searched. At the same time the original algorithm just needs to compare to neighboring letters, and is hence far faster.
Cheers Rolf
| [reply] [d/l] [select] |