A couple of questions on the structures of your files:

The way you currently have this implemented is approximately O(N**2) (assuming that the number of lines in each file are approximately equal). For data files on disk, this is not a good situation. Sorted data files can reduce this to O(N), which is about as good as you are going to get.

Update: Reread the original code, saw what it was actually doing rather than the apparent intent of what it should do:

open my $if1, '<', $input_f1 or die "Can't open $input_f1: $!\n"; open my $if2, '<', $input_f2 or die "Can't open $input_f2: $!\n"; while(<$if1>) { # Read each line of file1 my $line = $_; chomp($line); my ($key1, $vf1, $vf2) = split(/\*/, $line); seek($if2, 0, 0); # Make sure file handle point to the beginning o +f the file while (<$if2>) { # Read each line of file2 my $line2 = $_; chomp($line2); my ($key2, $value) = split(/\*/, $line2); if ($key1 eq $key2) { $vf1 = $value; ############ <strike> # } else { # $vf1 = ' '; ############ </strike> } } ############ <add> print join( '*', $key1, $vf1, $vf2 ), "\n"; ############ </add> }

The inner loop does not quite do what you state you want to do. You will only get an updated value for the last key in file1, and only then if the last key in file2 is also the same. Otherwise, you are clearing each and every value for $vf1. Strike out the marked section, and I think your script's logic will be correct, although it may not work quickly on larger data sets.

--MidLifeXis


In reply to Re: Indexing two large text files by MidLifeXis
in thread Indexing two large text files by never_more

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.