Re: Better solution to the code

Hi,

The poor performance comes from the fact that you are opening and parsing the same (big) file many times.

You would be better off reversing your strategy and opening the file, parsing it and comparing each line with the contents of your array @tag.

You should also, if possible, consider loading your data into a hash instead of an array. If you do that, you will profit from exists.

# your data is in %tag
open (IN, "<Input_file.dat") or die "Cannot read $!\n";
open (OUT,"+>Result_file.txt") or die "Cannot create file $!\n";
while (<IN>) {
    print OUT $_ if exists $tag{$_};
}
[download]

Lu.

Comment on Re: Better solution to the code Download Code

Replies are listed 'Best First'.
Re^2: Better solution to the code by moritz (Cardinal) on Jan 25, 2008 at 10:35 UTC
The idea with the hash won't work, because the regex match searches for a matching substring, the hash lookup compares the whole string. But that reminds me of another possible optimization: if `@tag` doesn't contain regexes but only constant substrings, index might speed up things. So instead of `if ($_ =~ m/$something/){ ... }`, you can write `if (0 <= index $_, $something)`.	[reply] [d/l] [select]
Re^2: Better solution to the code by cdarke (Prior) on Jan 25, 2008 at 12:56 UTC
BTW, to put @tags into %tags use: `my %tags; @tags{@tags} = undef;` [download] Yes, it's confusing calling a hash and and an array the same thing.	[reply] [d/l]