A couple of questions on the structures of your files:
- Are the source files sorted by the key value? If so, you may be able to use some sort of binary search algorithm.
- Is the data static? Perhaps some other storage format for the file (database, DBM::Deep, etc) might be a 'better' way of storing the data. Additionally, if a text format is the 'correct' way of storing the data, perhaps pre-sorting it would provide the ability to use a more efficient search algorithm.
- Are the records fixed size? If so, it makes the binary search (above) easier to implement, otherwise you will need to possibly index your text files. You would do that by reading through one of the files and recording the index (tell) of where each key is found in the file, and use that to read only the single line needed back into memory (seek). If the number of lines in the data files is significantly large, it is possible that even your indexes will exhaust your available memory.
- If both files are sorted by the keys, you can just step through them in tandem, skipping records that are missing from one or the other until you reach the end of the data. This would enable you to only have one line from each file in memory at a time, and make only a single pass through the data files.
- If you are on unix, this could be accomplished with sort and join without the memory constraints, given sufficient disk space.
The way you currently have this implemented is approximately O(N**2) (assuming that the number of lines in each file are approximately equal). For data files on disk, this is not a good situation. Sorted data files can reduce this to O(N), which is about as good as you are going to get.
Update: Reread the original code, saw what it was actually doing rather than the apparent intent of what it should do:
open my $if1, '<', $input_f1 or die "Can't open $input_f1: $!\n";
open my $if2, '<', $input_f2 or die "Can't open $input_f2: $!\n";
while(<$if1>) { # Read each line of file1
my $line = $_;
chomp($line);
my ($key1, $vf1, $vf2) = split(/\*/, $line);
seek($if2, 0, 0); # Make sure file handle point to the beginning o
+f the file
while (<$if2>) { # Read each line of file2
my $line2 = $_;
chomp($line2);
my ($key2, $value) = split(/\*/, $line2);
if ($key1 eq $key2) {
$vf1 = $value;
############ <strike>
# } else {
# $vf1 = ' ';
############ </strike>
}
}
############ <add>
print join( '*', $key1, $vf1, $vf2 ), "\n";
############ </add>
}
The inner loop does not quite do what you state you want to do. You will only get an updated value for the last key in file1, and only then if the last key in file2 is also the same. Otherwise, you are clearing each and every value for $vf1. Strike out the marked section, and I think your script's logic will be correct, although it may not work quickly on larger data sets.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.