in reply to match two files
use strict; use warnings; use Tie::Hash::Indexed; tie my %lines1, 'Tie::Hash::Indexed'; # gives you the ordered hash open my $IN1, '<', "tmp12" or die "Cannot open this file: $! +"; open my $IN2, '<', "donor_82_01.csv" or die "Cannot open this file: $? +"; # step 1, cache contents of $IN1 (read the first file once) # populate %lines1 "cache" for my $item1 (<$IN1>) { @tmp1 = split( /\t+/, $item1 ); $lines1{ $tmp[1] } = \@tmp1; # save full $item1 line, keyed on +$tmp[1] } # step 2, iterate over contents of $IN2 / look up in %lines1 to compar +e open my $OUT, '>', "tmp12_01" or die "Cannot open this file: $?"; LOOKUP_AND_COMPARE: for $item2 (@lines2) { #chomp $item2; # not needed, see last line my @tmp2 = split( /\,+/, $item2 ); # -- look up if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { my @tmp1 = @{ $lines1{ $tmp2[0] } }; # for clarity, not act +ually needed; can get value via "$lines1{ $tmp2[0] }->[0]" print $OUT $tmp1[0], ",", $item2; #<-updated to fix + bareword from old code last LOOKUP_AND_COMPARE; } } #print $OUT "\n"; # probably don't need if you don't "chomp $it +em2"
Additional optimizations, depending on your constraint (timeversus space):
The lesson here, as stated below is to not nest your loops. It's called "computational complexity". Basically only want to have at most 1 level of looping. The line, if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { is the "constant time" look up capability that is being provided for by the ordered caching of the first file above and how you avoid the inner loop.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: match two files
by hippo (Archbishop) on Jun 03, 2020 at 13:55 UTC | |
by perlfan (Parson) on Jun 03, 2020 at 14:09 UTC | |
|
Re^2: match two files
by yueli711 (Sexton) on Jun 04, 2020 at 04:57 UTC | |
by marto (Cardinal) on Jun 04, 2020 at 11:43 UTC | |
by yueli711 (Sexton) on Jun 05, 2020 at 02:53 UTC | |
by marto (Cardinal) on Jun 05, 2020 at 06:16 UTC | |
by hippo (Archbishop) on Jun 04, 2020 at 09:02 UTC |