The author is clearly an idiot :)
The Perl porters have long known about pathological expressions that will continue to run after the heat-death of the universe. This doesn't mean those patterns should be banished, but with a little care, and a little thought, Perl probably offers the informed programmer techniques for supplying hints to the regular expression engine to avoid such exponential blowups.
I also find it interesting that Perl allowed me to write a very small amount of code in order to explore the problem. And the solution here is to not use greedy matching.
Change 'aaa' =~ /a?a?a?aaa/ to /a??a??a??aaa/ and you have to push n out to about 100000 before you begin to notice any appreciable slowdown in the matching. Take the following for a spin:
use strict; use Time::HiRes 'time'; my $n = shift || 3; my $opt = shift || '?'; my $str = 'a' x $n; my $re = ("a$opt" x $n) . $str; my $begin = time; print $str =~ $re, ' ', time-$begin, "\n";
Try running perl rematcher 20 and then perl rematcher 20000 ??. I get a better than three orders of magnitude improvement. Not bad for five minutes thought. Seeing how the Thomson graph climbs slowly but steadily from the beginning, I would not be surprised if Perl continued to outperform it all the way out until things begin to swap.
update: after finishing the article, I retract the statement that the author is an idiot. He does know his stuff, however, I still don't agree with the conclusion. He says that DFAs are better because you don't run into grief with patterns like /(.*)(.*)(.*)(.*)(.*)/ since it doesn't go exponential. This is just silly. If anything, it encourages sloppy programming.
Backreferences allow one to solve a much larger range of problems that DFAs cannot. That's why NFAs are in such widespread use. And if it takes a little education and experience to wield NFAs with assurance, then so be it.
• another intruder with the mooring in the heart of the Perl
In reply to Re: Perl regexp matching is slow??
by grinder
in thread Perl regexp matching is slow??
by smahesh
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |