mdog has asked for the wisdom of the Perl Monks concerning the following question:
I want to find most effiecient / fastest way to compare an array of sorted data against a gzipped file of data that is also sorted against the same sort of key.
The array would look like:
@array = qw(item1 item2 item4 item5);
and the data in the file would look like:
blahv {TAB} ackg {TAB} item2 blahs {TAB} acka {TAB} item3 blaha {TAB} ackd {TAB} item4
I need to see if items in the array exist in the file and then do something with that line of data.
The rub is that the array has about 3 million values and the file has about 4 million lines so my usual hackish brute force foreach loops just won't cut it. I have looked at other nodes about comparing arrays but still can't get it right in my head.
I'm not even sure which way to process this...With the outer loop being the file or the array.
I know that once I am up to "item2" in the array I don't want start scanning the file from the beginning but I don't know the best way to accomplish that.
I know I could put the file in memory (seems like a bad idea...that's a lot of memory in use then) and have the array be the outer loop, I could keep a counter that tells where I left off each time in the file content array and jump to that starting spot...or have the counter keep jumping through the file until it hits that line number if I were to leave it gzipped but I'd like the pros opinion if you don't mind.
Many thanks,
mdog
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Efficiently compare sorted array vs another sorted array and pattern match
by Roy Johnson (Monsignor) on Apr 08, 2005 at 22:35 UTC | |
|
Re: Efficiently compare sorted array vs another sorted array and pattern match
by tlm (Prior) on Apr 09, 2005 at 02:11 UTC | |
|
Re: Efficiently compare sorted array vs another sorted array and pattern match
by eXile (Priest) on Apr 09, 2005 at 04:46 UTC | |
|
Re: Efficiently compare sorted array vs another sorted array and pattern match
by thekestrel (Friar) on Apr 09, 2005 at 15:02 UTC |