The way you describe it, it looks like each record in datafile2 could (and often will) match more than one record in datafile1. Do you intend for your results file to show all the "file1" matches for a given "file2" record?

Apart from that, it looks like the nine-digit strings that are labeled "sid" and "eid" in the file1 (%VAR1) data structure are supposed to be the cue for deciding whether a given record in "file2" is a match, based on its first field -- that is, the first line in your "datafile2" example, which starts with "200110100", ought to be a match for the first date range in all three company records from "datafile1". Have I got that right? (the post is a bit confusing, because the Data::Dumper-like output content doesn't match the sample file excerpt)

If so, then I think my first inclination would be to make the "join" data the outer-most layer of the "file1" data structure, and make it as easy as possible to identify the matches -- something like this (based on the data in your example "file1" excerpt):

$VAR1 = { '200210014 200210105' => [ "ABC Corp. / 1 / some text description", "XYZ Ltd. / 1 / some text description", "CDC Inc. / 1 / some text description", ], '200211011 200212053' => [ "ABC Corp. / 2 / some text description", "XYZ Ltd. / 2 / some text description", ], '200323021 200331234' => [ "ABC Corp. / 3 / some text description", ], etc... }
In other words, file1 fills a hash of arrays, where the hash keys are "start_id end_id" for each date range found in file1; each of these hash elements holds an array of one or more company records, where each record is potentially just a single structured string, holding whatever is relevant for your results file.

With this sort of data structure from file1, you can now read file2 and use the first field of each line to jump directly to the relevant file1 data (untested code, naturally):

while (<FILE2>) { my ($key2,$data) = split(/,/, $_, 2); # use grep to do the "join": my @match_keys = grep { my ($sid,$eid) = split(/ /,$_); $key2 >= $sid and $key2 <= $eid } keys %VAR +1; foreach my $matched_range ( @match_keys ) { my @matched_data = @{$VAR1{$matched_range}}; # do something with @matched_data } }

In reply to Re: Code efficiency / algorithm by graff
in thread Code efficiency / algorithm by dave8775

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.