in reply to many to many join on text files

The following was designed to optimize for memory usage. It stores only a hash, whose keys are the values of the key field seen in one of the files, and whose values are arrays of integers. I guess that means that if one file has a million lines, there will be a million numbers stored in memory, plus the overhead for the hash and however many arrays. Despite this memory optimization, I believe this is pretty efficient. The cost is a second, random, read of one of the files.
# first, scan the first file, noting the file pos's on which each +key occurs. my %key_pos_in_first_file; # key=key, val=array of file positions. open F1, "+< $first_file" or die "open $first_file for random read + - $!\n"; my $p1 = 0; while (<F1>) { chomp; my @l = split /\|/; push @{ $key_pos_in_first_file{ $l[0] } }, $p1; $p1 = tell F1; } # second, go through the second file, joining. open F2, "< $second_file" or die "open $second_file for read - $!\ +n"; while (<F2>) { chomp; my @l2 = split /\|/; # go to each pos in the first file and use that line for my $p1 ( @{ $key_pos_in_first_file{ $l2[0] } } ) { seek F1, $p1, 0; my $l1 = <F1>; chomp $l1; my @l1 = split /\|/, $l1; # join print "@l2 - @l1\n"; } } close F2; close F1;

jdporter
The 6th Rule of Perl Club is -- There is no Rule #6.