in reply to Simple comparison of 2 files
Here's an approach that combines a nested while-loop/for-loop with validation of input. As has been mentioned before, reading the "small" file to an array is practical for files of several million to a few score million lines, depending on how much system RAM you have available. (Of course, you may want to die rather than warn if you see an invalid input line.)
File iter_2_files_1.pl:
Output:use warnings; use strict; use autodie; use Data::Dump qw(pp); # data extraction and validation regexes. my $rx_L = qr{ [[:upper:]] }xms; my $rx_N = qr{ \d+ _ \d+ }xms; my $rx_line = qr{ \A ($rx_L) \s+ ($rx_N) \s* \z }xms; # in-memory test files (for convenience only). my $f_large = qq{A 1_1\nA 1_2\nB 1_3\nLaRgE\nC 1_4}; my $f_small = qq{A 2_1\nSmAlL\nB 2_2}; # read, validate and process small file, hold in array. open my $fh_small, '<', \$f_small; my @small_file_line_fields = map { my $valid = my ($letter, $number) = $_ =~ $rx_line; warn qq{bad small file line '$_'} unless $valid; $valid ? [ $letter, $number ] : (); } <$fh_small> ; close $fh_small; print 'small file: ', pp(\@small_file_line_fields), qq{\n\n}; # process large file line-by-line. open my $fh_large, '<', \$f_large; LARGE: while (my $line_large = <$fh_large>) { my $valid = my ($large_L, $large_N) = $line_large =~ $rx_line; warn qq{bad large file line '$line_large'} and next LARGE unless $valid; # iterate over all lines of small file for each line of large file. SMALL: for my $ar_fields (@small_file_line_fields) { my ($small_L, $small_N) = @$ar_fields; printf qq{%s from %s with number %s and %s from %s with number %s +}, $large_L, 'FILE1', $large_N, $small_L, 'FILE2', $small_N; print 'DO NOT ' if $large_L ne $small_L; print qq{match \n}; } # end for SMALL loop } # end while LARGE loop close $fh_large;
c:\@Work\Perl\monks\Q.and>perl iter_2_files_1.pl bad small file line 'SmAlL ' at iter_2_files_1.pl line 107, <$_[...]> line 3. small file: [["A", "2_1"], ["B", "2_2"]] A from FILE1 with number 1_1 and A from FILE2 with number 2_1 match A from FILE1 with number 1_1 and B from FILE2 with number 2_2 DO NOT m +atch A from FILE1 with number 1_2 and A from FILE2 with number 2_1 match A from FILE1 with number 1_2 and B from FILE2 with number 2_2 DO NOT m +atch B from FILE1 with number 1_3 and A from FILE2 with number 2_1 DO NOT m +atch B from FILE1 with number 1_3 and B from FILE2 with number 2_2 match bad large file line 'LaRgE ' at iter_2_files_1.pl line 121, <$_[...]> line 4. C from FILE1 with number 1_4 and A from FILE2 with number 2_1 DO NOT m +atch C from FILE1 with number 1_4 and B from FILE2 with number 2_2 DO NOT m +atch
Give a man a fish: <%-{-{-{-<
|
|---|