Re: speeding up a regex

It's fairly easy to use Benchmark to try out different solutions and see which is fastest. Here's a sample of testing multiple regexps (your original solution), one big regexp, and using index:

#!/usr/bin/perl

use warnings;
use strict;
use Benchmark;

our @list = ('create the world','blah blah',
             'drop it already','foo schnerp',
             'need to delete','flip schnitzel',
             'send me that update!','a flibbertygibitz',
             'mailing insert collection','grand central station');

our @wordlist = qw(create drop delete update insert);
our @relist = map { qr/\b$_\b/ } @wordlist;
our $bigre_t = '\b(?:'.join('|',@wordlist).')\b';
our $bigre = qr/$bigre_t/;
print "bigre: $bigre\n";

sub several_re
{
  my $match = 0;
  foreach my $s (@list) {
    foreach my $re (@relist) {
      if ($s =~ /$re/) {
        $match++;
        last;
      }
    }
  }
  $match;
}

sub one_re
{
  my $match = 0;
  foreach my $s (@list) {
    if ($s =~ /$bigre/) {
      $match++;
    }
  }
  $match;
}

sub use_index
{
  my $match = 0;
  foreach my $s (@list) {
    foreach my $word (@wordlist) {
      if (index($s,$word) >= 0) {
        $match++;
        last;
      }
    }
  }
  $match;
}

print "several_re: ", several_re(),"\n";
print "one_re: ", one_re(),"\n";
print "use_index: ", use_index(),"\n";

timethese(100_000, {
  'Several Regexp' => \&several_re,
  'One Big Regexp' => \&one_re,
  'With index()'   => \&use_index,
});
[download]

In this benchmark, the one big regexp solution is fastest:

Benchmark: timing 100000 iterations...
One Big Regexp:  7 wallclock secs (6.22 CPU) @ 16077.17/s (n=100000)
Several Regexp: 12 wallclock secs (11.27 CPU) @ 8873.11/s (n=100000)
With index(): 11 wallclock secs (8.71 CPU) @ 11481.06/s (n=100000)
[download]

But the results you get running on your own data will be more useful.

Comment on Re: speeding up a regex Select or Download Code

Replies are listed 'Best First'.
Re^2: speeding up a regex by GrandFather (Saint) on Jan 03, 2006 at 22:17 UTC
I like the resuts from cmpthese much better than those from timethese. Here is the same benchmark using cmpthese Read more... Benchmark code (2 kB) Prints: `bigre: (?-xism:\b(?:create\|drop\|delete\|update\|insert)\b) several_re: 5 one_re: 5 use_index: 5 Rate Several Regexp With index() One Big Regexp Several Regexp 24634/s -- -24% -50% With index() 32367/s 31% -- -34% One Big Regexp 49358/s 100% 52% --` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: speeding up a regex
by GrandFather (Saint) on Jan 03, 2006 at 22:17 UTC

I like the resuts from cmpthese much better than those from timethese. Here is the same benchmark using cmpthese