In the original problem statement, there was a need to check whether some id exists in file2 that does not in file1. That is why %file1 was created.
If you look at the code you posted, there are 3 main steps: (1) make the hash %file1 (ids in file1), (2) make %file2 (ids in file2), (3) process keys (all unique id's) in %file1. Step(4) process all unique ids in %file2 is not there anymore - so the data structure for it is not needed either.
So, the %file1 hash is not needed. The idea is to combine step1 and step3 together as a new step(3) and get rid of step (1) altogether.
Take out the step 1 code. And then modify step(3): instead of foreach my $id1 (keys %file1){...}, just use the first part of what was step(1):
I didn't test this, but that should give you a repeated line if an id in file1 repeats on a different line.while (my $row = $csv->getline($FILE1)) { # $row is a reference to a row my @fields = @$row; # this explicitly de-references my $id1 = $fields[1]; if (exists $file2{$id1}) { $csv->print ($FILE3, "HL", @fields); #both files } else { $csv->print ($FILE3,"NOT_HK", @fields); #file1 only } }
I do not know why you added "chomp $row;". That's not needed. $row is a reference to an array that the csv module creates when it reads the line from the file. The program won't bomb, but this line doesn't do anything useful.
In reply to Re: Matching two files based on one column common in each
by Marshall
in thread Matching two files based on one column common in each
by bluray
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |