The hash approach will work, but it will be memory bound if the files grow enormous. Anyway, if I were to take the hash approach, here's one way I might do it:
use strict;
use warnings;
my %indices;
open my $primary, '<', 'filename.txt' or die $!
while ( my $line = <$primary> ) {
my $key = ( split /,/, $line )[0];
# The following line is wrong.
# $indices{$line} = 0;
# Here's the correct line...
$indices{$key} = 0;
}
close $primary;
open my $secondary, '<', 'filename2.txt' or die $!;
while ( my $line = <$secondary> ) {
my $key = ( split /,/, $line )[0];
if( exists $indices{$key} ) {
$indices{$key}++;
}
}
close $secondary;
foreach( keys %indices ) {
if( $indices{$_} > 0 ) {
print "$_ from the first file was found ",
$indices{$_},
" times in the second file.\n";
}
}
That's one way to do it. If your files are going to grow big enough for memory to become a concern you would need an approach that doesn't attempt to hold the whole index in memory at once. A lightweight database like SQLite could be helpful in that regard.
|