in reply to match two files

Here's how I'd do it (for clarity, this was basically suggested in the first reply) - code untested :
use strict; use warnings; use Tie::Hash::Indexed; tie my %lines1, 'Tie::Hash::Indexed'; # gives you the ordered hash open my $IN1, '<', "tmp12" or die "Cannot open this file: $! +"; open my $IN2, '<', "donor_82_01.csv" or die "Cannot open this file: $? +"; # step 1, cache contents of $IN1 (read the first file once) # populate %lines1 "cache" for my $item1 (<$IN1>) { @tmp1 = split( /\t+/, $item1 ); $lines1{ $tmp[1] } = \@tmp1; # save full $item1 line, keyed on +$tmp[1] } # step 2, iterate over contents of $IN2 / look up in %lines1 to compar +e open my $OUT, '>', "tmp12_01" or die "Cannot open this file: $?"; LOOKUP_AND_COMPARE: for $item2 (@lines2) { #chomp $item2; # not needed, see last line my @tmp2 = split( /\,+/, $item2 ); # -- look up if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { my @tmp1 = @{ $lines1{ $tmp2[0] } }; # for clarity, not act +ually needed; can get value via "$lines1{ $tmp2[0] }->[0]" print $OUT $tmp1[0], ",", $item2; #<-updated to fix + bareword from old code last LOOKUP_AND_COMPARE; } } #print $OUT "\n"; # probably don't need if you don't "chomp $it +em2"

Additional optimizations, depending on your constraint (timeversus space):

The lesson here, as stated below is to not nest your loops. It's called "computational complexity". Basically only want to have at most 1 level of looping. The line, if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { is the "constant time" look up capability that is being provided for by the ordered caching of the first file above and how you avoid the inner loop.

Replies are listed 'Best First'.
Re^2: match two files
by hippo (Archbishop) on Jun 03, 2020 at 13:55 UTC
            print OUT $tmp1[0], ",", $item2;

    There is no bareword filehandle OUT anywhere else in your code. Perhaps you meant $OUT? warnings catches these.

      Good catch. for OP's benefit add,
      use strict; use warnings;
      And fixed the bareword file handle. Missed that when updating their code. :) ty....
Re^2: match two files
by yueli711 (Sexton) on Jun 04, 2020 at 04:57 UTC

    Hello perlfan, Thank you so much for your useful code!I already  $ sudo cpan Tie::File::AsHash It still got this error. Thank you again and really appreciated!

    li@lix:~$ perl match11.pl Can't locate Tie/Hash/Indexed.pm in @INC (you may need to install the +Tie::Hash::Indexed module) (@INC contains: /etc/perl /usr/local/lib/x +86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86 +_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/p +erl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/x86_6 +4-linux-gnu/perl-base) at match11.pl line 4. BEGIN failed--compilation aborted at match11.pl line 4.

      "I already $ sudo cpan Tie::File::AsHash It still got this error.

      This module is not used by the code you thanked perlfan for. The error suggests you install Tie::Hash::Indexed, which has many install failures.

        Hello marto, Thank you so much for your useful suggestion! I still have some errors. Thank you again and really appreciated! Best, Yue

        li@li-HP$ sudo cpan Tie::Hash::Indexed Loading internal null logger. Install Log::Log4perl for logging messag +es Reading '/home/li/.cpan/Metadata' Database was generated on Thu, 04 Jun 2020 02:41:02 GMT Running install for module 'Tie::Hash::Indexed' Checksum for /home/li/.cpan/sources/authors/id/M/MH/MHX/Tie-Hash-Index +ed-0.05.tar.gz ok 'YAML' not installed, will not store persistent state Configuring M/MH/MHX/Tie-Hash-Indexed-0.05.tar.gz with Makefile.PL Setting license tag... Checking if your kit is complete... Looks good Generating a Unix-style Makefile Writing Makefile for Tie::Hash::Indexed Writing MYMETA.yml and MYMETA.json MHX/Tie-Hash-Indexed-0.05.tar.gz /usr/bin/perl Makefile.PL INSTALLDIRS=site -- OK Running make for M/MH/MHX/Tie-Hash-Indexed-0.05.tar.gz cp lib/Tie/Hash/Indexed.pm blib/lib/Tie/Hash/Indexed.pm Running Mkbootstrap for Indexed () chmod 644 "Indexed.bs" "/usr/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- Indexed.bs + blib/arch/auto/Tie/Hash/Indexed/Indexed.bs 644 "/usr/bin/perl" "/usr/share/perl/5.26/ExtUtils/xsubpp" -typemap '/usr +/share/perl/5.26/ExtUtils/typemap' -typemap '/home/li/.cpan/build/Tie +-Hash-Indexed-0.05-2/typemap' Indexed.xs > Indexed.xsc mv Indexed.xsc Indexed.c x86_64-linux-gnu-gcc -c -I. -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwra +pv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURC +E -D_FILE_OFFSET_BITS=64 -O2 -g -DVERSION=\"0.05\" -DXS_VERSION=\"0 +.05\" -fPIC "-I/usr/lib/x86_64-linux-gnu/perl/5.26/CORE" -DNDEBUG In +dexed.c rm -f blib/arch/auto/Tie/Hash/Indexed/Indexed.so x86_64-linux-gnu-gcc -shared -L/usr/local/lib -fstack-protector-stron +g Indexed.o -o blib/arch/auto/Tie/Hash/Indexed/Indexed.so \ \ chmod 755 blib/arch/auto/Tie/Hash/Indexed/Indexed.so Manifying 1 pod document MHX/Tie-Hash-Indexed-0.05.tar.gz /usr/bin/make -- OK Running make test "/usr/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- Indexed.bs + blib/arch/auto/Tie/Hash/Indexed/Indexed.bs 644 PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::H +arness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/l +ib', 'blib/arch')" t/*.t t/101_basic.t ..... 1/32 # Failed test 8 in t/101_basic.t at line 43 # t/101_basic.t line 43 is: skip($scalar, $s =~ /^(\d+)\/\d+$/ && $1 +== scalar keys %h); # Failed test 11 in t/101_basic.t at line 49 # t/101_basic.t line 49 is: skip($scalar, $s =~ /^(\d+)\/\d+$/ && $1 +== scalar keys %h); # Failed test 14 in t/101_basic.t at line 55 # t/101_basic.t line 55 is: skip($scalar, $s =~ /^(\d+)\/\d+$/ && $1 +== scalar keys %h); t/101_basic.t ..... Failed 3/32 subtests t/102_storable.t .. ok t/103_bugs.t ...... ok Test Summary Report ------------------- t/101_basic.t (Wstat: 0 Tests: 32 Failed: 3) Failed tests: 8, 11, 14 Files=3, Tests=69, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.02 cusr + 0.00 csys = 0.04 CPU) Result: FAIL Failed 1/3 test programs. 3/69 subtests failed. Makefile:1002: recipe for target 'test_dynamic' failed make: *** [test_dynamic] Error 255 MHX/Tie-Hash-Indexed-0.05.tar.gz /usr/bin/make test -- NOT OK //hint// to see the cpan-testers results for installing this module, t +ry: reports MHX/Tie-Hash-Indexed-0.05.tar.gz

      The error message which you quoted not only tells you what's wrong but even goes so far as to suggest what you may need to do in order to fix it. Did you read it? Did you do what it suggested? What happened then?