in reply to Re^2: speeding up a regex
in thread speeding up a regex

In modern Perls -- I'm not sure which versions qualify here, maybe 5.6+ -- Perl will check whether the contents of the variable has changed. If the content of the variable has not changed, the regexp is not recompiled.

For example, compare

my @words = ( 'foo', 'bar', ); foreach (@array) { foreach my $word (@words) { if (/\Q$word/) { # BAD!! $word always changes. print; last; } } }

to

my @words = ( 'foo', 'bar', ); foreach my $word (@words) { foreach (@array) { if (/\Q$word/) { # GOOD!! regexp only recompiled when needed. print; last; } } }

It's not always practical to change the order of the loops. For example, when one of them reads from a file. In that case, the solution is to precompile the regexps. For example, compare

my @words = ( 'foo', 'bar', ); while (<FILE>) { foreach my $word (@words) { if (/\Q$word/) { # BAD!! $word always changes. print; last; } } }

to

my @words = ( 'foo', 'bar', ); # Precompile the regexps. my @regexps = map { qr/\Q$_/ } @words; while (<FILE>) { foreach my $regexp (@regexps) { if (/$regexps/) { # GOOD!! $regexp is a compiled regexp. #if ($_ =~ $regexp) { # GOOD!! Alternate syntax. print; last; } } }

If you're trying to match constant strings rather than regexps, then I recommend Regexp::List:

use Regexp::List (); my @words = ( 'foo', 'bar', ); my $regexp = Regexp::List->new()->list2re(@words); while (<FILE>) { print if /$regexp/; #print if $_ =~ $regexp; # Alternate syntax. }

By the way,
for ($loop_index = 0; $loop_index < $#patterns; $loop_index++) {
is much less readable and no more efficient than
for my $loop_index (0..$#patterns) {
You could also have used
foreach (@patterns) {

Finally, in your case, I'd use

my @patterns = qw( create drop delete update insert ); my $regexp; $regexp = Regexp::List->new(modifiers => 'i')->list2re(@patterns); $regexp = qr/\b(?:$regexp)\b/; ... while ($data = $sth->fetchrow_arrayref()) { # index is faster than regexps on constant strings. next if index(lc($data->[10]), 'tempdb') >= 0; if ($data->[13] =~ $regexp) { print "$data->[3] $data->[9] $data->[10] $data->[13]\n"; last; } }

Update: Bug Fix: Changed $word to $_ in map's code block.

Replies are listed 'Best First'.
Re^4: speeding up a regex
by sgifford (Prior) on Jan 03, 2006 at 19:59 UTC
    In modern Perls -- I'm not sure which versions qualify here, maybe 5.6+ -- Perl will check whether the contents of the variable has changed. If the content of the variable has not changed, the regexp is not recompiled.
    ...
    foreach my $word (@words) { if (/\Q$word/) { # BAD!! $word always changes.
    Hi ikegami,

    Thanks for the very informative post! A quick question: I thought that in a loop like the one above, the iterator variable ($word) was temporarily aliased to each value of the array. So I would expect that as long as the array's contents didn't change, perl would know not to recompile the RE, and so both of your above examples would be the same speed.

    But a quick Benchmark agrees with you: putting the RE in the outer loop is about twice as fast as in the inner loop.

    Any hints as to what's wrong with my understanding of variable aliasing or RE caching?

    Thanks!

      What matters is the value of $word, because we're using the value of $word to compile the regexp. While the value of the variable to which $word is/was aliased doesn't change, the value of $word itself does change.

      In the first pass, $word is "foo". The regexp was thus compiled with to /foo/. In the second pass, $word is "bar". We obviously need to recompile the regexp because we want /bar/ and it's currently /foo/. Whether $words[0] and $words[1] changed or not is completely irrelevant.

      You might be thinking that the compiled regexp is stored with the variable used in the regexp. It's not. That wouldn't work when no variables or multiple variables are used to create the regexps. Instead, the uncompiled regexp or the values of the variables used to compile the regexp -- I don't know which -- is stored along with the compiled regexp in the code.

Re^4: speeding up a regex
by halley (Prior) on Jan 03, 2006 at 17:03 UTC
    Every once in a while, I see a post like this and wish I could ++ it more than once. For newcomers to a language, seeing "badform/goodform" examples is really a great way to understand the benefits of different approaches.

    --
    [ e d @ h a l l e y . c c ]