in reply to Matching two files based on one column common in each

Yes, I remember this code from: Issues with Column headings.

In the original problem statement, there was a need to check whether some id exists in file2 that does not in file1. That is why %file1 was created.

If you look at the code you posted, there are 3 main steps: (1) make the hash %file1 (ids in file1), (2) make %file2 (ids in file2), (3) process keys (all unique id's) in %file1. Step(4) process all unique ids in %file2 is not there anymore - so the data structure for it is not needed either.

So, the %file1 hash is not needed. The idea is to combine step1 and step3 together as a new step(3) and get rid of step (1) altogether.

Take out the step 1 code. And then modify step(3): instead of foreach my $id1 (keys %file1){...}, just use the first part of what was step(1):

while (my $row = $csv->getline($FILE1)) { # $row is a reference to a row my @fields = @$row; # this explicitly de-references my $id1 = $fields[1]; if (exists $file2{$id1}) { $csv->print ($FILE3, "HL", @fields); #both files } else { $csv->print ($FILE3,"NOT_HK", @fields); #file1 only } }
I didn't test this, but that should give you a repeated line if an id in file1 repeats on a different line.

I do not know why you added "chomp $row;". That's not needed. $row is a reference to an array that the csv module creates when it reads the line from the file. The program won't bomb, but this line doesn't do anything useful.

Replies are listed 'Best First'.
Re^2: Matching two files based on one column common in each
by bluray (Sexton) on Sep 28, 2011 at 21:09 UTC
    Hi Marshall,

    Thanks for the code. I did changed the code a bit to make it work. Now, I am getting the correct output file.

    while (my $row = $csv->getline($FILE1)) { # $row is a reference to a row my @fields = @$row; # this explicitly de-references my $id1 = $fields[1]; if (exists $file2{$id1}) { my $fields_ref= \@fields; unshift(@$fields_ref, "HK"); $csv->print ($FILE3, $fields_ref); # $csv->print ($FILE3, "HK", @fields); #both files } else { my $fields_ref = \@fields; unshift (@$fields_ref, "NOT_HK"); $csv->print ($FILE3, $fields_ref); } }