krazken has asked for the wisdom of the Perl Monks concerning the following question:
Here I have 3 distinct records. I have done something to match them together as noted by the matchkey. I know that they are a duplicate record even though there is variation in the name. My question is does anyone have a good way of grouping these records together so that I can populate missing data where fields are missing? Basically I am wanting to do an overlay for those who have heard of that before... my output should look like##layout is as follows.. ID|name|address|city|state|zip|phone|matchkey 1|krazken|123 Main|BFE|AR|72210|555-2345|1 2|kraken||||||1 3|krayken||||555-2345|1
I have tried anonymous hashes on the matchkey, and that works ok for small stuff, but when you have flat files that have millions of records in it, this gets expensive in a hurry, and I usually run out of memory. I have tried tie'ing my hashes with DB_File to save memory, but that is just too dang slow. So I was wondering if anyone has done this type of stuff in perl, and if so, how did you do it?1|krazken|123 Main|BFE|AR|72210|555-2345|1 2|kraken|123 Main|BFE|AR|72210|555-2345|1 3|krayken|123 Main|BFE|AR|72210|555-2345|1
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
(jeffa) Re: Merge Purge
by jeffa (Bishop) on Mar 22, 2002 at 06:07 UTC | |
|
Re: Merge Purge
by shotgunefx (Parson) on Mar 22, 2002 at 06:02 UTC | |
by krazken (Scribe) on Mar 22, 2002 at 15:09 UTC | |
by shotgunefx (Parson) on Mar 22, 2002 at 19:46 UTC | |
|
Re: Merge Purge
by abstracts (Hermit) on Mar 22, 2002 at 06:20 UTC | |
by Fletch (Bishop) on Mar 22, 2002 at 15:16 UTC |