Does anyone have or know of an easy way to do matchgroup/overlay type of flat file I/O? Example:
##layout is as follows.. ID|name|address|city|state|zip|phone|matchkey 1|krazken|123 Main|BFE|AR|72210|555-2345|1 2|kraken||||||1 3|krayken||||555-2345|1
Here I have 3 distinct records. I have done something to match them together as noted by the matchkey. I know that they are a duplicate record even though there is variation in the name. My question is does anyone have a good way of grouping these records together so that I can populate missing data where fields are missing? Basically I am wanting to do an overlay for those who have heard of that before... my output should look like
1|krazken|123 Main|BFE|AR|72210|555-2345|1 2|kraken|123 Main|BFE|AR|72210|555-2345|1 3|krayken|123 Main|BFE|AR|72210|555-2345|1
I have tried anonymous hashes on the matchkey, and that works ok for small stuff, but when you have flat files that have millions of records in it, this gets expensive in a hurry, and I usually run out of memory. I have tried tie'ing my hashes with DB_File to save memory, but that is just too dang slow. So I was wondering if anyone has done this type of stuff in perl, and if so, how did you do it?

TIA krazken

In reply to Merge Purge by krazken

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.