Re: Regex optimizations

Using 5.005_03, I got almost identical times with the following code: (41 seconds for \d{2} and 40 seconds for \d\d. But this could be a second granularity problem.)

use strict;

speed_test('\d{2}', sub { shift() =~ /\d{2}/ });
speed_test('\d\d', sub { shift() =~ /\d\d/ });

sub speed_test {
    my ($name, $func) = @_;
    my $start = time;

    my $index = 10000000;
    while ($index-- > 0) {
        &$func($index);
    }

    my $end = time - $start;

    print "Total time for $name: $end\n";
}
[download]

It just occured to me that the subroutine indirection might be the bottleneck, but when I wrote a new version, the times were also identical. Additionally, I tested other numbers such as \d{5} (and corresponding \d\d\d\d\d) but this had no effect on the timing either.

YMMV.

-Ted

Comment on Re: Regex optimizations Select or Download Code

Replies are listed 'Best First'.
(Ovid) RE(2): Regex optimizations by Ovid (Cardinal) on Oct 25, 2000 at 01:16 UTC
Some of you may have wondered why I didn't just benchmark it myself. Because I was being stupid. I looked at tedv's benchmark and couldn't believe I hadn't bothered to check. Blame it on over two-weeks straight of working without a day off :( A critical portion of regex optmization is what occurs on failure. I created a string which had at least one failure for the regex and then benchmarked the comparisons: `#!/usr/bin/perl -w use strict; use Benchmark; my ( $re, $test ); timethese(-30, { regex1 => '$re = "a" . \'\d\'x500 . "a"; $test = ("a"x2000 . "1"x499)x2 . "a" . "1"x500 . ("a"x +2000 . "1"x499)x2 ; $test =~ /$re/o;', regex2 => '$re = "a" . \'\d{500}\' . "a"; $test = ("a"x2000 . "1"x499)x2 . "a" . "1"x500 . ("a"x +2000 . "1"x499)x2 ; $test =~ /$re/o;', });` [download] Results: `Benchmark: running regex1, regex2, each for at least 30 CPU seconds... regex1: 31 wallclock secs (31.31 usr + 0.00 sys = 31.31 CPU) @ 65 +5.10/s (n=20508) regex2: 31 wallclock secs (30.84 usr + 0.00 sys = 30.84 CPU) @ 53 +0.35/s (n=16358)` [download] After running this a couple of times, I see that `\d{2}` is less efficient than `\d\d`, though not by much. If there is terribly convoluted data I am iterating over, though, it could be an issue. I still don't see why I didn't benchmark it before asking. Thanks for slapping sense into me, tedv :) Cheers, Ovid Join the Perlmonks Setiathome Group or just go the the link and check out our stats.	[reply] [d/l] [select]

Replies are listed 'Best First'.

(Ovid) RE(2): Regex optimizations
by Ovid (Cardinal) on Oct 25, 2000 at 01:16 UTC

tedv's

A critical portion of regex optmization is what occurs on failure. I created a string which had at least one failure for the regex and then benchmarked the comparisons:

#!/usr/bin/perl -w
use strict;
use Benchmark;

my ( $re, $test );

timethese(-30, {
    regex1  => '$re = "a" . \'\d\'x500 . "a";
                $test = ("a"x2000 . "1"x499)x2 . "a" . "1"x500 . ("a"x
+2000 . "1"x499)x2 ;
                $test =~ /$re/o;',
    regex2  => '$re = "a" . \'\d{500}\' . "a";
                $test = ("a"x2000 . "1"x499)x2 . "a" . "1"x500 . ("a"x
+2000 . "1"x499)x2 ;
                $test =~ /$re/o;',
});
[download]

Benchmark: running regex1, regex2, each for at least 30 CPU seconds...
    regex1: 31 wallclock secs (31.31 usr +  0.00 sys = 31.31 CPU) @ 65
+5.10/s (n=20508)
    regex2: 31 wallclock secs (30.84 usr +  0.00 sys = 30.84 CPU) @ 53
+0.35/s (n=16358)
[download]

\d{2}

\d\d

I still don't see why I didn't benchmark it before asking. Thanks for slapping sense into me, tedv :)

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just go the the link and check out our stats.

[reply]
[d/l]
[select]