sonicbill11 has asked for the wisdom of the Perl Monks concerning the following question:

I need to merge (or is it join?) two data sets by the id field for both data. These are fictional data sets for example.

Data set 1:
id x1 x2
A 12 45
A 10 12
A 15 74
C 11 41
C 18 12
B 23 15
D 19 10
D 11 12
E 14 5
E 34 31
F 10 45

Data set 2:
id word
A hello
B goodbye
D happy
E black
C blue

I want to keep all the data is Data 1, as long as it matches with Data 2, by id.

Desired output
A 12 45 hello
A 10 12 hello
A 15 74 hello
C 11 41 blue
C 18 12 blue
B 23 15 goodbye
D 19 10 happy
D 11 12 happy
E 14 5 black
E 34 31 black

Here is the code I am using so far:
my %file1hash; open (DAT1,"$fin1"); while(my $line=<DAT1>) { chomp($line); my ($id,$x1,$x2)=split(' ' ,$line); $file1hash{$id} = "$x1 $x2"; } close DAT1; open (DAT2,"$fin2"); while (my $line=<DAT2>) { chomp($line); my ($id,$word)=split(' ', $line); if (exists $file1hash{$id}) { printf ("%1s %5s %7s\n", $id, $file1hash{$id}, $word); } } close DAT2; exit;

My output from this code is
A 15 74 hello
B 23 15 goodbye
D 11 12 happy
E 34 31 black
C 18 12 blue

Almost there, but it only takes the last values for each id. Any help or direction to a previously asked similar question would be appreciated. Thanks!

Replies are listed 'Best First'.
Re: Merging two data sets
by kevbot (Vicar) on Jun 13, 2014 at 05:32 UTC
    This can also be accomplished using the Data::Table module. This code:
    #!/usr/bin/env perl use strict; use warnings; use Data::Table; my $t1 = Data::Table::fromCSV("d1.dat", 1, undef, { delimiter => " " } +); my $t2 = Data::Table::fromCSV("d2.dat", 1, undef, { delimiter => " " } +); my $t3 = $t1->join($t2, Data::Table::INNER_JOIN, [ 'id' ], ['id']); $t3->sort('id', 1, 0); print $t3->tsv; exit;
    gives this output:
    id x1 x2 word A 12 45 hello A 10 12 hello A 15 74 hello B 23 15 goodbye C 11 41 blue C 18 12 blue D 19 10 happy D 11 12 happy E 14 5 black E 34 31 black
      Great. I got it to produce the correct output based on your suggestions and code. Thank you all for your help!
Re: Merging two data sets
by LanX (Saint) on Jun 13, 2014 at 01:37 UTC
    Looks for me like you do it the wrong way round.

    You have to read data set 2 into a hash and then parse set 1.

    HTH! :)

    Cheers Rolf

    (addicted to the Perl Programming Language)

Re: Merging two data sets
by vinoth.ree (Monsignor) on Jun 13, 2014 at 03:39 UTC

    Hi,

    As suggested by LanX please find the below code

    use strict; use warnings; use Data::Dumper; my %file1hash; my $fin1="./dataset1.txt"; my $fin2="./dataset2.txt"; my ($fh1,$fh2); open ($fh2, "<", "$fin2") or die $!; %file1hash = map{ chomp; split / /; } <$fh2>; print Dumper \%file1hash; close $fh2; open ($fh1, "<", "$fin1") or die $!; while(<$fh1>){ chomp; my ($id, $x1, $x2) = split / /; if (exists $file1hash{$id}) { printf ("%1s %5s %7s %10s\n", $id, $x1, $x2, $file1has +h{$id}); } } close $fh1; exit;

    All is well
Re: Merging two data sets
by Jim (Curate) on Jun 13, 2014 at 05:06 UTC
    C:\>perl -e "system('awk \"FNR == NR { if (FNR != 1) word[$1] = $2; ne +xt } $1 in word { print $0, word[$1] }\" DataSet2.txt DataSet1.txt')" A 12 45 hello A 10 12 hello A 15 74 hello C 11 41 blue C 18 12 blue B 23 15 goodbye D 19 10 happy D 11 12 happy E 14 5 black E 34 31 black C:\>cat DataSet2.txt id word A hello B goodbye D happy E black C blue C:\>cat DataSet1.txt id x1 x2 A 12 45 A 10 12 A 15 74 C 11 41 C 18 12 B 23 15 D 19 10 D 11 12 E 14 5 E 34 31 F 10 45 C:\>

    And here's the pure AWK version…

    C:\>awk "FNR == NR { if (FNR > 1) word[$1] = $2; next } $1 in word { p +rint $0, word[$1] }" DataSet2.txt DataSet1.txt A 12 45 hello A 10 12 hello A 15 74 hello C 11 41 blue C 18 12 blue B 23 15 goodbye D 19 10 happy D 11 12 happy E 14 5 black E 34 31 black C:\>
Re: Merging two data sets
by InfiniteSilence (Curate) on Jun 13, 2014 at 20:28 UTC

    Think in the following order,

    • Description of problem
    • Data structures to hold data
    • Relevant operations on those data structures to perform the desired operation/processing
    #!/usr/bin/perl -w use strict; my %file1hash; my ($fin1, $fin2) = qw|alpha.dat beta.dat|; open (DAT1,"$fin1"); while(my $line=<DAT1>) { chomp($line); next if $line=~m/^id/; my ($id,$x1,$x2)=split(' ' ,$line); push @{$file1hash{$id}}, "$x1 $x2"; } close DAT1; open (DAT2,"$fin2"); while (my $line=<DAT2>) { chomp($line); next if $line=~m/^id/; my ($id,$word)=split(' ', $line); if (exists $file1hash{$id}) { for my $x (@{$file1hash{$id}}){ print qq|$id\t$x\t$word\n|; } } } close DAT2;

    Makes..

    A 12 45 hello A 10 12 hello A 15 74 hello B 23 15 goodbye D 19 10 happy D 11 12 happy E 14 5 black E 34 31 black C 11 41 blue C 18 12 blue

    Analysis: Your code was creating a hash with simple elements when what you really wanted was a hash of array references (incorrect choice of data structure). When you have > 1 element you can iterate over them and get them all.

    Celebrate Intellectual Diversity