Something like the code below reduces it to one pass. It assumes that the two files are both pre-sorted on they key field.
The idea is to maintain a buffer containing a window of all the adjacent lines in the second file that have the same current key. As the key increases, the current buffer is thrown away and the next chunk of lines is read in (stopping when the key changes). Then read in the first file 1 line at a time and get its key. If the key is less than the current key for the buffer, print the line; if it's greater, print the accumulated lines from the second file and refill the buffer. If they're the same, print out the current line from file 1 with each of the lines in the buffer. The code below doesn't actually work yet; it needs more work to ensure that the buffer is flushed at the right times, etc, and doesn't handle EOFs correctly. But I'm supposed to working rather than messing on perlMonks...

#!/usr/bin/perl -w use strict; open my $f1, 'a'; open my $f2, 'b'; my ($key2, @rest2, $nkey2, $nrest2); # read in next N lines from f2 that have the same key sub get_next_block { @rest2 = (); while (1) { if (defined $nkey2) { push @rest2, $nrest2; $key2 = $nkey2; } my $line2 = <$f2>; return 0 unless defined $line2; ($nkey2, $nrest2) = split / /, $line2; chomp $nrest2; last if defined $key2 && $nkey2 ne $key2; } } get_next_block(); OUTER: while (defined (my $line1 = <$f1>)) { my ($key1, $rest1) = split / /, $line1; chomp $rest1; if ($key1 gt $key2) { print "$key2 $_\n" for @rest2; get_next_block(); next; } if ($key1 lt $key2) { print $line1; next; } print "$key1 $rest1 $_\n" for @rest2; } print while (<$f1>); print while (<$f2>);

In reply to Re: Re: Re: Re: many to many join on text files by dave_the_m
in thread many to many join on text files by aquarium

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.