Hello everyone,

I have 2 data files that I need to do some matching with and then print a results file with the matches found and some metadata info (like counters, etc).

The first data file contains company names with date ranges and the other data file contains multiple date values (one per line). I need to match each of these values to the range in the other file and then create the results file.

__datafile1__ ABC Corp. 1 200210014 200210105 some text description 2 200211011 200212053 some text description 3 200323021 200331234 some text description XYZ Ltd. 1 200210014 200210105 some text description 2 200211011 200212053 some text description CDC Inc. 1 200110014 200110325 some text description 2 200534011 200577234 some text description 3 200212344 200232399 some text description 4 199989987 199999991 some text description __datafile2__ ID,Address,MoreData 200110100,some text here,etc 200918943,some text here,etc 200211015,some text here,etc 199212395,some text here,etc 200110100,some text here,etc 200210100,some text here,etc ...

datafile2 is 80+MB!

I have now been able to successfully load datafile1 into a record structure (with some help from a monk ;)) and I can get the matches to work... what I am wondering is whether there is a better way to do this (both code wise and algorithm).

The logic I used was to load the data to a record: For example...

$VAR1 = { 'ABC Corp.' => { '20021' => { 'sid' => '200220014', 'eid' => '200221011' }, '20022' => { 'sid' => '200211011', 'eid' => '200212053' }, '20023' => { 'sid' => '200323021', 'eid' => '200331234' } } etc... };

... then for each of the date ids in datafile2 I have to search the record to find a match. Noticed that I used a hash key that has the date and the id concatenated so as to search the year first and then... look closer if need be.

Is there a better way? I can paste some code later if you need clarification on what I did.

Thanks!

David


In reply to Code efficiency / algorithm by dave8775

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.