Merging two data sets

sonicbill11 has asked for the wisdom of the Perl Monks concerning the following question:

I need to merge (or is it join?) two data sets by the id field for both data. These are fictional data sets for example.

Data set 1:
id x1 x2
A 12 45
A 10 12
A 15 74
C 11 41
C 18 12
B 23 15
D 19 10
D 11 12
E 14 5
E 34 31
F 10 45

Data set 2:
id word
A hello
B goodbye
D happy
E black
C blue

I want to keep all the data is Data 1, as long as it matches with Data 2, by id.

Desired output
A 12 45 hello
A 10 12 hello
A 15 74 hello
C 11 41 blue
C 18 12 blue
B 23 15 goodbye
D 19 10 happy
D 11 12 happy
E 14 5 black
E 34 31 black

Here is the code I am using so far:

 
my %file1hash;

open (DAT1,"$fin1");

while(my $line=<DAT1>) {
    chomp($line);
    my ($id,$x1,$x2)=split(' ' ,$line);
    $file1hash{$id} = "$x1 $x2";
}

close DAT1;

open (DAT2,"$fin2");
while (my $line=<DAT2>) {
    chomp($line);
    my ($id,$word)=split(' ', $line);
        if (exists $file1hash{$id}) {
        printf ("%1s %5s %7s\n", $id, $file1hash{$id}, $word);     
    }
}
close DAT2;

exit;
[download]

My output from this code is
A 15 74 hello
B 23 15 goodbye
D 11 12 happy
E 34 31 black
C 18 12 blue

Almost there, but it only takes the last values for each id. Any help or direction to a previously asked similar question would be appreciated. Thanks!

Comment on Merging two data sets Download Code

Replies are listed 'Best First'.
Re: Merging two data sets by kevbot (Vicar) on Jun 13, 2014 at 05:32 UTC
This can also be accomplished using the Data::Table module. This code: `#!/usr/bin/env perl use strict; use warnings; use Data::Table; my $t1 = Data::Table::fromCSV("d1.dat", 1, undef, { delimiter => " " } +); my $t2 = Data::Table::fromCSV("d2.dat", 1, undef, { delimiter => " " } +); my $t3 = $t1->join($t2, Data::Table::INNER_JOIN, [ 'id' ], ['id']); $t3->sort('id', 1, 0); print $t3->tsv; exit;` [download] gives this output: `id x1 x2 word A 12 45 hello A 10 12 hello A 15 74 hello B 23 15 goodbye C 11 41 blue C 18 12 blue D 19 10 happy D 11 12 happy E 14 5 black E 34 31 black` [download]	[reply] [d/l] [select]
Re^2: Merging two data sets by sonicbill11 (Initiate) on Jun 13, 2014 at 13:09 UTC
Great. I got it to produce the correct output based on your suggestions and code. Thank you all for your help!	[reply]
Re: Merging two data sets by LanX (Saint) on Jun 13, 2014 at 01:37 UTC
Looks for me like you do it the wrong way round. You have to read data set 2 into a hash and then parse set 1. HTH! :) Cheers Rolf (addicted to the Perl Programming Language)	[reply]
Re: Merging two data sets by vinoth.ree (Monsignor) on Jun 13, 2014 at 03:39 UTC
Hi, As suggested by LanX please find the below code `use strict; use warnings; use Data::Dumper; my %file1hash; my $fin1="./dataset1.txt"; my $fin2="./dataset2.txt"; my ($fh1,$fh2); open ($fh2, "<", "$fin2") or die $!; %file1hash = map{ chomp; split / /; } <$fh2>; print Dumper \%file1hash; close $fh2; open ($fh1, "<", "$fin1") or die $!; while(<$fh1>){ chomp; my ($id, $x1, $x2) = split / /; if (exists $file1hash{$id}) { printf ("%1s %5s %7s %10s\n", $id, $x1, $x2, $file1has +h{$id}); } } close $fh1; exit;` [download] *All is well*	[reply] [d/l]
Re: Merging two data sets by Jim (Curate) on Jun 13, 2014 at 05:06 UTC
`C:\>perl -e "system('awk \"FNR == NR { if (FNR != 1) word[$1] = $2; ne +xt } $1 in word { print $0, word[$1] }\" DataSet2.txt DataSet1.txt')" A 12 45 hello A 10 12 hello A 15 74 hello C 11 41 blue C 18 12 blue B 23 15 goodbye D 19 10 happy D 11 12 happy E 14 5 black E 34 31 black C:\>cat DataSet2.txt id word A hello B goodbye D happy E black C blue C:\>cat DataSet1.txt id x1 x2 A 12 45 A 10 12 A 15 74 C 11 41 C 18 12 B 23 15 D 19 10 D 11 12 E 14 5 E 34 31 F 10 45 C:\>` [download] And here's the pure AWK version… `C:\>awk "FNR == NR { if (FNR > 1) word[$1] = $2; next } $1 in word { p +rint $0, word[$1] }" DataSet2.txt DataSet1.txt A 12 45 hello A 10 12 hello A 15 74 hello C 11 41 blue C 18 12 blue B 23 15 goodbye D 19 10 happy D 11 12 happy E 14 5 black E 34 31 black C:\>` [download]	[reply] [d/l] [select]
Re: Merging two data sets by InfiniteSilence (Curate) on Jun 13, 2014 at 20:28 UTC
Think in the following order, Description of problem Data structures to hold data Relevant operations on those data structures to perform the desired operation/processing `#!/usr/bin/perl -w use strict; my %file1hash; my ($fin1, $fin2) = qw\|alpha.dat beta.dat\|; open (DAT1,"$fin1"); while(my $line=<DAT1>) { chomp($line); next if $line=~m/^id/; my ($id,$x1,$x2)=split(' ' ,$line); push @{$file1hash{$id}}, "$x1 $x2"; } close DAT1; open (DAT2,"$fin2"); while (my $line=<DAT2>) { chomp($line); next if $line=~m/^id/; my ($id,$word)=split(' ', $line); if (exists $file1hash{$id}) { for my $x (@{$file1hash{$id}}){ print qq\|$id\t$x\t$word\n\|; } } } close DAT2;` [download] Makes.. `A 12 45 hello A 10 12 hello A 15 74 hello B 23 15 goodbye D 19 10 happy D 11 12 happy E 14 5 black E 34 31 black C 11 41 blue C 18 12 blue` [download] Analysis: Your code was creating a hash with simple elements when what you really wanted was a hash of array references (incorrect choice of data structure). When you have > 1 element you can iterate over them and get them all. Celebrate Intellectual Diversity	[reply] [d/l] [select]