Re: match two files

Here's how I'd do it (for clarity, this was basically suggested in the first reply) - code untested :

use strict;
use warnings;
use Tie::Hash::Indexed;
tie my %lines1, 'Tie::Hash::Indexed';    # gives you the ordered hash

open my $IN1, '<', "tmp12"           or die "Cannot open this file: $!
+";
open my $IN2, '<', "donor_82_01.csv" or die "Cannot open this file: $?
+";

# step 1, cache contents of $IN1 (read the first file once)

# populate %lines1 "cache"
for my $item1 (<$IN1>) {
    @tmp1 = split( /\t+/, $item1 );
    $lines1{ $tmp[1] } = \@tmp1;    # save full $item1 line, keyed on 
+$tmp[1]
}   

# step 2, iterate over contents of $IN2 / look up in %lines1 to compar
+e

open my $OUT, '>', "tmp12_01" or die "Cannot open this file: $?";
LOOKUP_AND_COMPARE:
for $item2 (@lines2) {
    
    #chomp $item2;       # not needed, see last line
    my @tmp2 = split( /\,+/, $item2 );
    
    # -- look up 
    if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) {
        my @tmp1 = @{ $lines1{ $tmp2[0] } };    # for clarity, not act
+ually needed; can get value via "$lines1{ $tmp2[0] }->[0]"
        print $OUT $tmp1[0], ",", $item2;            #<-updated to fix
+ bareword from old code
        last LOOKUP_AND_COMPARE;
    }
}

#print $OUT "\n";        # probably don't need if you don't "chomp $it
+em2"
[download]

Additional optimizations, depending on your constraint (timeversus space):

if time, cache the larger of the 2 files
if space, cache the smaller of the 2 files

The lesson here, as stated below is to not nest your loops. It's called "computational complexity". Basically only want to have at most 1 level of looping. The line, if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { is the "constant time" look up capability that is being provided for by the ordered caching of the first file above and how you avoid the inner loop.

Comment on Re: match two files Select or Download Code

Replies are listed 'Best First'.
Re^2: match two files by hippo (Archbishop) on Jun 03, 2020 at 13:55 UTC
`print OUT $tmp1[0], ",", $item2;` There is no bareword filehandle `OUT` anywhere else in your code. Perhaps you meant `$OUT`? warnings catches these.	[reply] [d/l] [select]
Re^3: match two files by perlfan (Parson) on Jun 03, 2020 at 14:09 UTC
Good catch. for OP's benefit add, `use strict; use warnings;` [download] And fixed the bareword file handle. Missed that when updating their code. :) ty....	[reply] [d/l]
Re^2: match two files by yueli711 (Sexton) on Jun 04, 2020 at 04:57 UTC
Hello perlfan, Thank you so much for your useful code!I already `$ sudo cpan Tie::File::AsHash` It still got this error. Thank you again and really appreciated! `li@lix:~$ perl match11.pl Can't locate Tie/Hash/Indexed.pm in @INC (you may need to install the +Tie::Hash::Indexed module) (@INC contains: /etc/perl /usr/local/lib/x +86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86 +_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/p +erl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/x86_6 +4-linux-gnu/perl-base) at match11.pl line 4. BEGIN failed--compilation aborted at match11.pl line 4.` [download]	[reply] [d/l] [select]
Re^3: match two files by marto (Cardinal) on Jun 04, 2020 at 11:43 UTC
"I already $ `sudo cpan Tie::File::AsHash` It still got this error. This module is not used by the code you thanked perlfan for. The error suggests you install Tie::Hash::Indexed, which has many install failures.	[reply] [d/l]
Re^4: match two files by yueli711 (Sexton) on Jun 05, 2020 at 02:53 UTC
Hello marto, Thank you so much for your useful suggestion! I still have some errors. Thank you again and really appreciated! Best, Yue li@li-HP$ sudo cpan Tie::Hash::Indexed Loading internal null logger. Install Log::Log4perl for logging messag +es Reading '/home/li/.cpan/Metadata' Database was generated on Thu, 04 Jun 2020 02:41:02 GMT Running install for module 'Tie::Hash::Indexed' Checksum for /home/li/.cpan/sources/authors/id/M/MH/MHX/Tie-Hash-Index +ed-0.05.tar.gz ok 'YAML' not installed, will not store persistent state Configuring M/MH/MHX/Tie-Hash-Indexed-0.05.tar.gz with Makefile.PL Setting license tag... Checking if your kit is complete... Looks good Generating a Unix-style Makefile Writing Makefile for Tie::Hash::Indexed Writing MYMETA.yml and MYMETA.json MHX/Tie-Hash-Indexed-0.05.tar.gz /usr/bin/perl Makefile.PL INSTALLDIRS=site -- OK Running make for M/MH/MHX/Tie-Hash-Indexed-0.05.tar.gz cp lib/Tie/Hash/Indexed.pm blib/lib/Tie/Hash/Indexed.pm Running Mkbootstrap for Indexed () chmod 644 "Indexed.bs" "/usr/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- Indexed.bs + blib/arch/auto/Tie/Hash/Indexed/Indexed.bs 644 "/usr/bin/perl" "/usr/share/perl/5.26/ExtUtils/xsubpp" -typemap '/usr +/share/perl/5.26/ExtUtils/typemap' -typemap '/home/li/.cpan/build/Tie +-Hash-Indexed-0.05-2/typemap' Indexed.xs > Indexed.xsc mv Indexed.xsc Indexed.c x86_64-linux-gnu-gcc -c -I. -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwra +pv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURC +E -D_FILE_OFFSET_BITS=64 -O2 -g -DVERSION=\"0.05\" -DXS_VERSION=\"0 +.05\" -fPIC "-I/usr/lib/x86_64-linux-gnu/perl/5.26/CORE" -DNDEBUG In +dexed.c rm -f blib/arch/auto/Tie/Hash/Indexed/Indexed.so x86_64-linux-gnu-gcc -shared -L/usr/local/lib -fstack-protector-stron +g Indexed.o -o blib/arch/auto/Tie/Hash/Indexed/Indexed.so \ \ chmod 755 blib/arch/auto/Tie/Hash/Indexed/Indexed.so Manifying 1 pod document MHX/Tie-Hash-Indexed-0.05.tar.gz /usr/bin/make -- OK Running make test "/usr/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- Indexed.bs + blib/arch/auto/Tie/Hash/Indexed/Indexed.bs 644 PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::H +arness" "-e" "undef Test::Harness::Switches; test_harness(0, 'blib/l +ib', 'blib/arch')" t/.t t/101_basic.t ..... 1/32 # Failed test 8 in t/101_basic.t at line 43 # t/101_basic.t line 43 is: skip($scalar, $s =~ /^(\d+)\/\d+$/ && $1 +== scalar keys %h); # Failed test 11 in t/101_basic.t at line 49 # t/101_basic.t line 49 is: skip($scalar, $s =~ /^(\d+)\/\d+$/ && $1 +== scalar keys %h); # Failed test 14 in t/101_basic.t at line 55 # t/101_basic.t line 55 is: skip($scalar, $s =~ /^(\d+)\/\d+$/ && $1 +== scalar keys %h); t/101_basic.t ..... Failed 3/32 subtests t/102_storable.t .. ok t/103_bugs.t ...... ok Test Summary Report ------------------- t/101_basic.t (Wstat: 0 Tests: 32 Failed: 3) Failed tests: 8, 11, 14 Files=3, Tests=69, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.02 cusr + 0.00 csys = 0.04 CPU) Result: FAIL Failed 1/3 test programs. 3/69 subtests failed. Makefile:1002: recipe for target 'test_dynamic' failed make: *** [test_dynamic] Error 255 MHX/Tie-Hash-Indexed-0.05.tar.gz /usr/bin/make test -- NOT OK //hint// to see the cpan-testers results for installing this module, t +ry: reports MHX/Tie-Hash-Indexed-0.05.tar.gz [download]	[reply] [d/l]
Re^5: match two files by marto (Cardinal) on Jun 05, 2020 at 06:16 UTC
Re^3: match two files by hippo (Archbishop) on Jun 04, 2020 at 09:02 UTC
The error message which you quoted not only tells you what's wrong but even goes so far as to suggest what you may need to do in order to fix it. Did you read it? Did you do what it suggested? What happened then?	[reply]