Re: regexp puzzle

There are people around here more qualified to discuss some of the guts, but I can give you a rough idea of what the article is suggesting. What they are saying is that Perl's algorithm handles repeated backtracking poorly.

#!/usr/bin/perl
use strict;
use warnings;
use Time::HiRes qw(gettimeofday);

for my $n (1 .. 25) {
    my @start = gettimeofday;
    
    $_ = 'a' x $n;
    /(?:a?){$n}a{$n}/;
    
    my @end = gettimeofday;
    
    my $elapsed = $end[0] - $start[0] + 10**-6 * ($end[1] - $start[1])
+;
    
    print "$elapsed\n";
}
[download]

outputs

4.5e-005
3.2e-005
3.1e-005
3.4e-005
3.6e-005
4.2e-005
5.4e-005
8.1e-005
0.00013
0.000232
0.000448
0.000899
0.001696
0.003383
0.006309
0.012884
0.026508
0.053743
0.108439
0.212322
0.431355
0.890586
1.703086
3.468719
7.17187
[download]

Frankly, while I'm sure there is room for improvement in the regex library, I would characterize their example as poorly written code - rather than using repeated one-off matches, you should be using variable-width matches:

#!/usr/bin/perl
use strict;
use warnings;
use Time::HiRes qw(gettimeofday);

for my $n (1 .. 25) {
    my @start = gettimeofday;
    
    $_ = 'a' x $n;
    /a{0,$n}a{$n}/;
    
    my @end = gettimeofday;
    
    my $elapsed = $end[0] - $start[0] + 10**-6 * ($end[1] - $start[1])
+;
    
    print "$elapsed\n";
}
[download]

outputs

4.2e-005
2.9e-005
2.8e-005
2.9e-005
2.8e-005
2.9e-005
2.9e-005
2.9e-005
2.9e-005
2.8e-005
2.8e-005
2.9e-005
2.9e-005
2.9e-005
2.9e-005
2.9e-005
2.9e-005
2.9e-005
2.9e-005
3e-005
2.9e-005
2.9e-005
3e-005
3e-005
2.9e-005
[download]

and does the same thing. Note I've used curly brackets ({}) to indicate repetition; see perlretut for more info.

Comment on Re: regexp puzzle Select or Download Code

Replies are listed 'Best First'.
Re^2: regexp puzzle by bcrowell2 (Friar) on Jan 06, 2011 at 01:35 UTC
Ah, I see -- thanks! The author of the article works for google, and he wrote software for them that would accept regexp from random users on the web, so he wanted something that would always take linear time. (OTOH, seems like he could simply use any random regexp engine and put a time limit on how long it was allowed to tun.)	[reply]