in reply to Hash w/ multiple values + merging

To associate more than one value with a hash key you need an advanced data structure, a hash of anonymous arrays, here is one way to do that, I am reading the data from two files, File1.txt and File2.txt:
#!/usr/local/bin/perl use strict; use warnings; open (FH1, "File1.txt")or die("Error opening File1 $!\n"); my %hash1; while(<FH1>){ next if /(trip|valu1|valu2)/; #skip the header my ($key, $val1, $val2)= split /\s+/; push @{$hash1{$key}}, ($val1, $val2); #hash of anonymous + array } close FH1; open(FH2, "File2.txt")or die("Error opening File1 $!\n"); my %hash2; while(<FH2>){ next if /(trip|value)/; my($key, $val3)=split /\s+/; $hash2{$key}=$val3; } close FH2; #convey common keys into one hash.. my %hash3; my ($key1, $key2); foreach $key1(keys %hash1){ foreach $key2(keys %hash2){ if ($key2 eq $key1){ push @{$hash3{$key1}}, @{$hash1{$key1}}, $hash2{$ke +y2}; } } } #Print to STDOUT print "TRIP\tvalue1\tvalue2\tvalue3\n"; foreach my $key(keys %hash3){ print "$key\t"; print "@{$hash3{$key}}\n"; }
UPDATE: Perlref and Perlreftut are additional must-reads, to be able to manipulate the data structures you need to have an idea on references and how to dereference them..>


Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.

Replies are listed 'Best First'.
Re^2: Hash w/ multiple values + merging
by sophix (Sexton) on Feb 07, 2010 at 22:21 UTC
    Thanks for the reply! I tried to print out %hash1 to see whether it has the right thing, it lists the keys as expected but then lists the values as array (okay) but does not show them explicitly.
    ATA ARRAY(0x183d294) CTT ARRAY(0x183d304) CTG ARRAY(0x182a674) TTA ARRAY(0x183d464) ATG ARRAY(0x278eb4)
        So this is the script so far:
        #!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; my ($key,$val); open (FH1, "$ARGV[0]")or die("Error opening File1 $!\n"); my %hash1; while(<FH1>){ next if /(sample|value\d)/; my ($key, $val1, $val2)= split /\s+/; push @{$hash1{$key}}, ($val1, $val2); } close FH1; open(FH2, "$ARGV[1]")or die("Error opening File1 $!\n"); my %hash2; while(<FH2>){ next if /(sample|value\d)/; my($key, $val3)=split /\s+/; $hash2{$key}=$val3; } close FH2; my %hash3 = %hash1; foreach my $key2 ( keys %hash2 ) { if( exists $hash3{$key2} ) { push @{$hash3{$key2}}, ($hash2{$key2}); } else { delete $hash3{$key}; } } print Data::Dumper->Dump([\%hash1],['FIRST HASH']),"\n"; print Data::Dumper->Dump([\%hash2],['SECOND HASH']),"\n"; print Data::Dumper->Dump([\%hash3],['MERGED HASH']),"\n";
        - It reads into hash correctly - It finds the duplicates and update the value array for duplicate keys - but it does not get rid of the unmatched keys and their values

      You're printing out the stringified references. An easy way to inspect data structures is Data::Dumper.

        Thanks guys, you are really helping me out here. Now I need to find out the duplicates and merge two data sets for these duplicates while discarding the unmatched ones. I used the following script to find out the duplicates, and it indeed found the common keys.
        foreach (do { my %matcher; @matcher{map lc,keys %hash1}=(); grep exists $matcher{lc $_},keys %hash2; }) { print "$_ matches\n"; }
        yet I would like to merge the two data sets, so am I doing totally wrong by bothering with the matches as an extra loop? is there a way to merge two hashes conditional on common keys?
      what you are seeing is a reference to the data held under that key, so "ARRAY(0x183d294)" for example is the location where the value associated with "ATA" is stored, to access that value you need to dereference it using the appropriate dereferencers, read the links at the bottom of my previous reply.

      Since the reference type in this case is of an ARRAY something like "@$hash{ATA}}" would show you the values associated with "ATA", to access them one at a time you can specify indices like you do any regular arrays; $hash{ATA}[0] would print the first element of the anonymous array associated to the key "ATA"..

      The module Data::Dumper would show you the data structures stringified so that you could judge if they look like you expected them before proceeding any further...

      #ADD this to the previous code... use Data::Dumper; print Data::Dumper->Dump([\%hash1],['FIRST HASH']),"\n"; print Data::Dumper->Dump([\%hash2],['SECOND HASH']),"\n"; print Data::Dumper->Dump([\%hash3],['MERGED HASH']),"\n";

      Note also that there can be more than one way to do it which would become clearer when you start dealing with more complex data structures..


      Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.
        Yes, it worked just fine. Thank you, biohisham! Could you also please guide me for the merge operation? What I have in mind is the following: 1. Start iterating over the keys of hash1 2. Check if any of these keys match with the keys of hash2 3. if matched, push values %hash2 into values%hash1 Is this the right way to do it?