use strict; use warnings; use Tie::Hash::Indexed; tie my %lines1, 'Tie::Hash::Indexed'; # gives you the ordered hash open my $IN1, '<', "tmp12" or die "Cannot open this file: $! +"; open my $IN2, '<', "donor_82_01.csv" or die "Cannot open this file: $? +"; # step 1, cache contents of $IN1 (read the first file once) # populate %lines1 "cache" for my $item1 (<$IN1>) { @tmp1 = split( /\t+/, $item1 ); $lines1{ $tmp[1] } = \@tmp1; # save full $item1 line, keyed on +$tmp[1] } # step 2, iterate over contents of $IN2 / look up in %lines1 to compar +e open my $OUT, '>', "tmp12_01" or die "Cannot open this file: $?"; LOOKUP_AND_COMPARE: for $item2 (@lines2) { #chomp $item2; # not needed, see last line my @tmp2 = split( /\,+/, $item2 ); # -- look up if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { my @tmp1 = @{ $lines1{ $tmp2[0] } }; # for clarity, not act +ually needed; can get value via "$lines1{ $tmp2[0] }->[0]" print $OUT $tmp1[0], ",", $item2; #<-updated to fix + bareword from old code last LOOKUP_AND_COMPARE; } } #print $OUT "\n"; # probably don't need if you don't "chomp $it +em2"
Additional optimizations, depending on your constraint (timeversus space):
The lesson here, as stated below is to not nest your loops. It's called "computational complexity". Basically only want to have at most 1 level of looping. The line, if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { is the "constant time" look up capability that is being provided for by the ordered caching of the first file above and how you avoid the inner loop.
In reply to Re: match two files
by perlfan
in thread match two files
by yueli711
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |