It may be impossible to get an optimal result, but I think this heuristic might be quite successful:

Have 3 HashofArrays with the 3 fields as keys and an array of csv lines which have this key. In this array only the lines which are part of the minimal set are collected

Loop through the files with a module like Text::CSV or Parse::CSV

For every line check if any of the three fields is already in one of the three hashes. If any of the three field values is missing, add the line to the 3 hashes (i.e add line to array of hash1{$field_value1})

If all three field values are already in the database, check whether adding this line would allow you to drop two or three other lines out of the hashes. Lets call the three values of your line a b and c. Now hash1 for a should point to (a,x,y). Check if x in hash2 has two lines in the array (can't be more than 2) and y in hash3 has two lines. If that is the case, you could remove (a,x,y). Do the same with b and c. If you collected more than one line to remove, then do it. There is the complication that you could find the same lines more than once in the three searches so you have to be careful about the edge cases.


In reply to Re: Most efficient record selection method? by jethro
in thread Most efficient record selection method? by Kraythorne

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.