in reply to Re: Re: speeding up regex
in thread speeding up regex

Try replacing

$count++ while $text =~ /$gene/g; # Count number of instances

with

my $patn = qr/\b$gene\b/;
$count++ while $text =~ /$patn/g;

Replies are listed 'Best First'.
Re: Compile the regex
by dbwiz (Curate) on Jul 18, 2003 at 15:20 UTC

    Compiling keywords can make a difference if you do all of them at once, before the loop.

    #!/usr/bin/perl -w use strict; open WORDS, "<kwords" or die; my %kwords=(); while (<WORDS>) { chomp; $kwords{$_} = qr/\b$_\b/m; } close WORDS; my %found =(); for my $f (<abstract*>) { local $/; open FILE, $f or die "$f\n"; my $text = <FILE>; close FILE; for (keys %kwords) { my $val = $kwords{$_}; $found{$f} .= "$_ " if $text =~ /$val/; } } print "$_\t$found{$_}\n" for sort keys %found;

    Assuming that the keywords are in a file, and each abstract is in a separate file, precompilation makes the search 30% faster (using 1000 test files, 300 words each, 3 random keywords in 2/3 of them).

Re: Compile the regex
by Abigail-II (Bishop) on Jul 18, 2003 at 11:12 UTC
    Eh, could you give an example string and pattern where the compilation makes a difference? I've tried several patterns and strings, but Benchmark never shows a difference that's more than 1%.

    The \b could make a difference, but it's so far unclear whether a \b is justified or not.

    Abigail