in reply to need more efficient way to write this query script

is there a better way to make the searching faster?

Indeed there is. You can do away with the inner foreach loop (and a lot of regex recompiling) by constructing a regex and compiling it once. Change

while ($data = <DB>) { chomp($data); foreach $id (@id) { if ($data =~/$id/) { print OUT "$data\n"; } } }
to
my $regex = '(?:' . join('|', @id) . ')'; while ( $data = <DB> ) { print OUT $data if $data =~ /$regex/o; }
The (?: ... ) surrounding the regex is to avoid setting $1 as a side effect. This saves a tiny bit of work.

The /o modifier on the regex invocation says to compile the regex once (a big win). Bye bye, inner loop.

Note that you also do away with the chomp(), since it doesn't affect the regex, and you'll need a newline on the end of the string anyway.

Replies are listed 'Best First'.
Re: Re: need more efficient way to write this query script
by Ovid (Cardinal) on Aug 16, 2002 at 04:52 UTC

    Eek! While I suspect that would be faster, it's also is likely to involve a huge amount of backtracking if you have many IDs. Further, if the IDs are similar, this will decrease the performance even more. A straight hash lookup is going to be faster.

    Of course, showing people how to dynamically build regexes (even simple ones) is always a nice trick :)

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      While I suspect that would be faster, it's also is likely to involve a huge amount of backtracking if you have many IDs.

      True, but the target strings shown are pretty small. I'd benchmark this one before making assumptions about where the cutoverpoint between using a regex and using a hash is. I suspect that for a small number of keys, the regex wins.


      Hm... the original post doesn't give us much guidance about the cardinality of the key file. Given that, I'd probably go with a hash after all.