Re: Matching hash keys from different hashes and utilizing in new hash

Hi FIJI42,

if I understand well, you're looking for records having the same identifier (same first column) of file 1 and file 2 and want to output the data of common records.

This can be much simpler.

Start by reading the first file, store the data into a hash. Then read the second file line by line; if you find the identifier of a record of file 2 in the hash containing the data of file 1, then output it with the desired format. Something like that (untested because there is not enough sample data):

#!/usr/bin/perl
use strict; 
use warnings;

my ($file1, $file2) = @ARGV;
my %hash_file1;
open my $FILE1, "<", $file1 or die "Cannot open $file1 for processing!
+\n";
while (my $line = <$FILE1>) {
    my ($key, @fields) = split /\s+/, $line;
    $hash_file1{$key} = join ":", @fields;
}
close $FILE1;
open my $FILE2, "<", $file2 or die "Cannot open $file2 for processing!
+\n";
while (my $line = <$FILE2>) {
    my ($key, @fields) = split /\s+/, $line;
    my $rest_of_line = join ":", @fields;
    if (exists $hash_file1{$key}) {         # this is a common record 
+(same identifier)
        print $key, ":", $hash_file1{$key}, ":", $rest_of_line, "\n";
    }
} 
close $FILE2;
[download]

BTW, this should probably work with many more columns in your file.

Comment on Re: Matching hash keys from different hashes and utilizing in new hash Download Code

Replies are listed 'Best First'.
Re^2: Matching hash keys from different hashes and utilizing in new hash by huck (Prior) on Oct 21, 2017 at 21:54 UTC
This method sortof backdoors the "header-row" in that it assumes that $key in both "header-rows" are the same and they dont duplicate another valid $key in the data area. just saying :) ("Once bitten, twice shy")	[reply]
Re^3: Matching hash keys from different hashes and utilizing in new hash by Laurent_R (Canon) on Oct 21, 2017 at 23:01 UTC
Well, yes, maybe, but that's what I understand from the OP. I am doing this kind of processing (albeit usually much more complicated) all the time, but it very frequently (almost always) follows a remove duplicates step. Here, we don't know enough about input data.	[reply]
Re^2: Matching hash keys from different hashes and utilizing in new hash by FIJI42 (Acolyte) on Oct 22, 2017 at 05:28 UTC
Thanks, this was very helpful. I forgot to add that I wanted to make a new hash with only the common keys, and their associate column values (concatenated), but I believe I've got it. The reason for doing so was to split the columns apart in a subroutine I have for comparing the column values per key.	[reply]
Re^3: Matching hash keys from different hashes and utilizing in new hash by Laurent_R (Canon) on Oct 22, 2017 at 07:31 UTC
Then you can just populate your new hash at the place near the end of the code where there is the print statement. But maybe you don't even need to populate a new hash since, at this point in the code, you have the two keys and the two strings representing the other columns; so you could quite probably make the comparison (or call the subroutine making the comparison) just there, instead of the print statement.	[reply]