in reply to speeding up a regex

It's fairly easy to use Benchmark to try out different solutions and see which is fastest. Here's a sample of testing multiple regexps (your original solution), one big regexp, and using index:
#!/usr/bin/perl use warnings; use strict; use Benchmark; our @list = ('create the world','blah blah', 'drop it already','foo schnerp', 'need to delete','flip schnitzel', 'send me that update!','a flibbertygibitz', 'mailing insert collection','grand central station'); our @wordlist = qw(create drop delete update insert); our @relist = map { qr/\b$_\b/ } @wordlist; our $bigre_t = '\b(?:'.join('|',@wordlist).')\b'; our $bigre = qr/$bigre_t/; print "bigre: $bigre\n"; sub several_re { my $match = 0; foreach my $s (@list) { foreach my $re (@relist) { if ($s =~ /$re/) { $match++; last; } } } $match; } sub one_re { my $match = 0; foreach my $s (@list) { if ($s =~ /$bigre/) { $match++; } } $match; } sub use_index { my $match = 0; foreach my $s (@list) { foreach my $word (@wordlist) { if (index($s,$word) >= 0) { $match++; last; } } } $match; } print "several_re: ", several_re(),"\n"; print "one_re: ", one_re(),"\n"; print "use_index: ", use_index(),"\n"; timethese(100_000, { 'Several Regexp' => \&several_re, 'One Big Regexp' => \&one_re, 'With index()' => \&use_index, });

In this benchmark, the one big regexp solution is fastest:

Benchmark: timing 100000 iterations... One Big Regexp: 7 wallclock secs (6.22 CPU) @ 16077.17/s (n=100000) Several Regexp: 12 wallclock secs (11.27 CPU) @ 8873.11/s (n=100000) With index(): 11 wallclock secs (8.71 CPU) @ 11481.06/s (n=100000)
But the results you get running on your own data will be more useful.

Replies are listed 'Best First'.
Re^2: speeding up a regex
by GrandFather (Saint) on Jan 03, 2006 at 22:17 UTC

    I like the resuts from cmpthese much better than those from timethese. Here is the same benchmark using cmpthese

    Prints:

    bigre: (?-xism:\b(?:create|drop|delete|update|insert)\b) several_re: 5 one_re: 5 use_index: 5 Rate Several Regexp With index() One Big Regexp Several Regexp 24634/s -- -24% -50% With index() 32367/s 31% -- -34% One Big Regexp 49358/s 100% 52% --

    DWIM is Perl's answer to Gödel