Something like the code below reduces it to one pass. It assumes that the
two files are both pre-sorted on they key field.
The idea is to maintain a buffer containing a window of all the adjacent
lines in the second file that have the same current key. As the key
increases, the current buffer is thrown away and the next chunk of lines
is read in (stopping when the key changes). Then
read in the first file 1 line at a time and get its key. If the key is
less than the current key for the buffer, print the line; if it's greater,
print the accumulated lines from the second file and refill the buffer. If
they're the same, print out the current line from file 1 with each of the
lines in the buffer. The code below doesn't actually work yet; it needs
more work to ensure that the buffer is flushed at the right times, etc, and
doesn't handle EOFs correctly. But I'm supposed to working rather than
messing on perlMonks...
#!/usr/bin/perl -w
use strict;
open my $f1, 'a';
open my $f2, 'b';
my ($key2, @rest2, $nkey2, $nrest2);
# read in next N lines from f2 that have the same key
sub get_next_block {
@rest2 = ();
while (1) {
if (defined $nkey2) {
push @rest2, $nrest2;
$key2 = $nkey2;
}
my $line2 = <$f2>;
return 0 unless defined $line2;
($nkey2, $nrest2) = split / /, $line2;
chomp $nrest2;
last if defined $key2 && $nkey2 ne $key2;
}
}
get_next_block();
OUTER:
while (defined (my $line1 = <$f1>)) {
my ($key1, $rest1) = split / /, $line1;
chomp $rest1;
if ($key1 gt $key2) {
print "$key2 $_\n" for @rest2;
get_next_block();
next;
}
if ($key1 lt $key2) {
print $line1;
next;
}
print "$key1 $rest1 $_\n" for @rest2;
}
print while (<$f1>);
print while (<$f2>);
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.