Re: Regexes are slow (or, why I advocate String::Index)

The problem with Perl's regular expressions is that they're so easy to use that once you learn them, it can be quite tempting to use them for everything. A lot of skilled Perl hackers actually shun the string manipulation functions as crutches of the novice Perl programmer.

Anytime I need to find or extract a specific character or string from within another string, instead of resorting to the regex engine I use the index(), rindex(), and substr() functions. The real downside to doing this is that you run the risk of coming off looking like a novice to your peers by not using regexps. I guess if anyone ever calls me on it, I'll just say, "So what? Lincoln Stein does it that way too!" :)

#!/usr/bin/perl -w
use strict;
use Benchmark 'cmpthese';

my $string = "this is a string" x 300;

cmpthese(10000000, {
    'index' => sub {
        my $res;

        $res = index($string, "this", 0);
        $res = index($string, "string");
        $res = rindex($string, "string");
    },
    regex   => sub {
        my $res;

        $res = $string =~ /^this/;
        $res = $string =~ /string/;
        $res = $string =~ /string$/;
    }
});
...
[download]

Benchmark: timing 10000000 iterations of index, regex...
     index: 10 wallclock secs (10.21 usr +  0.00 sys = 10.21 CPU) @ 979431.93/s
(n=10000000)
     regex: 20 wallclock secs (20.05 usr +  0.00 sys = 20.05 CPU) @ 498753.12/s
(n=10000000)
          Rate regex index
regex 498753/s    --  -49%
index 979432/s   96%    --

Comment on Re: Regexes are slow (or, why I advocate String::Index) Download Code