The data validation is not performed in the generation application. We've written a few companion scripts that read in the generated data, perform some integrity checks, and compare data with all other data read in.

Each line of data is three records, so the script splits the data and performs a SHA-1 calculation on each record. Each SHA value is saved to a file, then File::Sort is used to sort the SHA file. Then, it's a simple matter of reading each line of the sorted file and comparing it against the previously read line to see if there's a duplicate record.

I chose to compute the SHA for each record because the SHA value is significantly smaller than the record, and SHA values are guaranteed to be unique unless the records are indentical.


Guildenstern
Negaterd character class uber alles!

In reply to (Guildenstern) Re: Re: Taming a memory hog by Guildenstern
in thread Taming a memory hog by Guildenstern

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.