Apart from that, it looks like the nine-digit strings that are labeled "sid" and "eid" in the file1 (%VAR1) data structure are supposed to be the cue for deciding whether a given record in "file2" is a match, based on its first field -- that is, the first line in your "datafile2" example, which starts with "200110100", ought to be a match for the first date range in all three company records from "datafile1". Have I got that right? (the post is a bit confusing, because the Data::Dumper-like output content doesn't match the sample file excerpt)
If so, then I think my first inclination would be to make the "join" data the outer-most layer of the "file1" data structure, and make it as easy as possible to identify the matches -- something like this (based on the data in your example "file1" excerpt):
In other words, file1 fills a hash of arrays, where the hash keys are "start_id end_id" for each date range found in file1; each of these hash elements holds an array of one or more company records, where each record is potentially just a single structured string, holding whatever is relevant for your results file.$VAR1 = { '200210014 200210105' => [ "ABC Corp. / 1 / some text description", "XYZ Ltd. / 1 / some text description", "CDC Inc. / 1 / some text description", ], '200211011 200212053' => [ "ABC Corp. / 2 / some text description", "XYZ Ltd. / 2 / some text description", ], '200323021 200331234' => [ "ABC Corp. / 3 / some text description", ], etc... }
With this sort of data structure from file1, you can now read file2 and use the first field of each line to jump directly to the relevant file1 data (untested code, naturally):
while (<FILE2>) { my ($key2,$data) = split(/,/, $_, 2); # use grep to do the "join": my @match_keys = grep { my ($sid,$eid) = split(/ /,$_); $key2 >= $sid and $key2 <= $eid } keys %VAR +1; foreach my $matched_range ( @match_keys ) { my @matched_data = @{$VAR1{$matched_range}}; # do something with @matched_data } }
In reply to Re: Code efficiency / algorithm
by graff
in thread Code efficiency / algorithm
by dave8775
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |