in reply to Indexing two large text files
A couple of questions on the structures of your files:
The way you currently have this implemented is approximately O(N**2) (assuming that the number of lines in each file are approximately equal). For data files on disk, this is not a good situation. Sorted data files can reduce this to O(N), which is about as good as you are going to get.
Update: Reread the original code, saw what it was actually doing rather than the apparent intent of what it should do:
open my $if1, '<', $input_f1 or die "Can't open $input_f1: $!\n"; open my $if2, '<', $input_f2 or die "Can't open $input_f2: $!\n"; while(<$if1>) { # Read each line of file1 my $line = $_; chomp($line); my ($key1, $vf1, $vf2) = split(/\*/, $line); seek($if2, 0, 0); # Make sure file handle point to the beginning o +f the file while (<$if2>) { # Read each line of file2 my $line2 = $_; chomp($line2); my ($key2, $value) = split(/\*/, $line2); if ($key1 eq $key2) { $vf1 = $value; ############ <strike> # } else { # $vf1 = ' '; ############ </strike> } } ############ <add> print join( '*', $key1, $vf1, $vf2 ), "\n"; ############ </add> }
The inner loop does not quite do what you state you want to do. You will only get an updated value for the last key in file1, and only then if the last key in file2 is also the same. Otherwise, you are clearing each and every value for $vf1. Strike out the marked section, and I think your script's logic will be correct, although it may not work quickly on larger data sets.
--MidLifeXis
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Indexing two large text files
by never_more (Initiate) on Apr 09, 2012 at 13:41 UTC |