comment on

Here's how I'd do it (for clarity, this was basically suggested in the first reply) - code untested :

use strict;
use warnings;
use Tie::Hash::Indexed;
tie my %lines1, 'Tie::Hash::Indexed';    # gives you the ordered hash

open my $IN1, '<', "tmp12"           or die "Cannot open this file: $!
+";
open my $IN2, '<', "donor_82_01.csv" or die "Cannot open this file: $?
+";

# step 1, cache contents of $IN1 (read the first file once)

# populate %lines1 "cache"
for my $item1 (<$IN1>) {
    @tmp1 = split( /\t+/, $item1 );
    $lines1{ $tmp[1] } = \@tmp1;    # save full $item1 line, keyed on 
+$tmp[1]
}   

# step 2, iterate over contents of $IN2 / look up in %lines1 to compar
+e

open my $OUT, '>', "tmp12_01" or die "Cannot open this file: $?";
LOOKUP_AND_COMPARE:
for $item2 (@lines2) {
    
    #chomp $item2;       # not needed, see last line
    my @tmp2 = split( /\,+/, $item2 );
    
    # -- look up 
    if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) {
        my @tmp1 = @{ $lines1{ $tmp2[0] } };    # for clarity, not act
+ually needed; can get value via "$lines1{ $tmp2[0] }->[0]"
        print $OUT $tmp1[0], ",", $item2;            #<-updated to fix
+ bareword from old code
        last LOOKUP_AND_COMPARE;
    }
}

#print $OUT "\n";        # probably don't need if you don't "chomp $it
+em2"
[download]

Additional optimizations, depending on your constraint (timeversus space):

if time, cache the larger of the 2 files
if space, cache the smaller of the 2 files

The lesson here, as stated below is to not nest your loops. It's called "computational complexity". Basically only want to have at most 1 level of looping. The line, if ( 'ARRAY' eq $lines1{ $tmp2[0] } ) { is the "constant time" look up capability that is being provided for by the ordered caching of the first file above and how you avoid the inner loop.

In reply to Re: match two files by perlfan
in thread match two files by yueli711

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.