lukka has asked for the wisdom of the Perl Monks concerning the following question:
The problem is it seems to work ok if say file/array sizes are small. Say BUFFER.dat has 10000 lines and say validbufs has 28 elements, then it finishes in 2 minutes. However as soon as BUFFER.dat has large number of lines (e.g 322129 lines) and @validbufs has 200000 elements, then it seems to take hours!! Please note: @validbufs is a unique list of strings. and in BUFFER.dat also the first column is always unique. There are no duplicates in the input data (both in the BUFFER.dat and in @validbufs). @validbufs can have number of elements varying between 200000 to say 50. So essentially if @validbufs has say 50 elements, then the script should just print out the 50 lines in BUFFER.dat which match the elements in @validbufs. If @validbufs has say 200000 elements, then the script should print out the 200000 lines in BUFFER.dat that match the elements in @validbufs. I tried splitting the big file BUFFER.dat in lines of 1000 each and doing the lookup on the split files, but even that seems very slow (takes hours). Can you please suggest what is a fast way to do this lookup?my %linecontainsbuf=(); while ($line = <BUFFER>) { @fields=split /\'/,$line; $searchfield=$fields[1]; $linecontainsbuf{$searchfield} = $line for @validbufs; } foreach $validbuf (@validbufs) { print $linecontainsbuf{$validbuf}; };
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: how to speed lookups?
by ikegami (Patriarch) on Nov 11, 2008 at 21:06 UTC | |
by lukka (Novice) on Nov 11, 2008 at 21:36 UTC | |
by blazar (Canon) on Nov 12, 2008 at 19:04 UTC | |
by ikegami (Patriarch) on Nov 12, 2008 at 19:39 UTC | |
|
Re: how to speed lookups?
by dragonchild (Archbishop) on Nov 11, 2008 at 21:01 UTC | |
|
Re: how to speed lookups?
by duckyd (Hermit) on Nov 12, 2008 at 00:45 UTC | |
by lukka (Novice) on Nov 12, 2008 at 11:44 UTC | |
|
Re: how to speed lookups?
by blazar (Canon) on Nov 12, 2008 at 19:55 UTC |