in reply to Re: Hash w/ multiple values + merging
in thread Hash w/ multiple values + merging

Thanks for the reply! I tried to print out %hash1 to see whether it has the right thing, it lists the keys as expected but then lists the values as array (okay) but does not show them explicitly.
ATA ARRAY(0x183d294) CTT ARRAY(0x183d304) CTG ARRAY(0x182a674) TTA ARRAY(0x183d464) ATG ARRAY(0x278eb4)

Replies are listed 'Best First'.
Re^3: Hash w/ multiple values + merging
by planetscape (Chancellor) on Feb 07, 2010 at 23:08 UTC
      So this is the script so far:
      #!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; my ($key,$val); open (FH1, "$ARGV[0]")or die("Error opening File1 $!\n"); my %hash1; while(<FH1>){ next if /(sample|value\d)/; my ($key, $val1, $val2)= split /\s+/; push @{$hash1{$key}}, ($val1, $val2); } close FH1; open(FH2, "$ARGV[1]")or die("Error opening File1 $!\n"); my %hash2; while(<FH2>){ next if /(sample|value\d)/; my($key, $val3)=split /\s+/; $hash2{$key}=$val3; } close FH2; my %hash3 = %hash1; foreach my $key2 ( keys %hash2 ) { if( exists $hash3{$key2} ) { push @{$hash3{$key2}}, ($hash2{$key2}); } else { delete $hash3{$key}; } } print Data::Dumper->Dump([\%hash1],['FIRST HASH']),"\n"; print Data::Dumper->Dump([\%hash2],['SECOND HASH']),"\n"; print Data::Dumper->Dump([\%hash3],['MERGED HASH']),"\n";
      - It reads into hash correctly - It finds the duplicates and update the value array for duplicate keys - but it does not get rid of the unmatched keys and their values
Re^3: Hash w/ multiple values + merging
by Corion (Patriarch) on Feb 07, 2010 at 22:27 UTC

    You're printing out the stringified references. An easy way to inspect data structures is Data::Dumper.

      Thanks guys, you are really helping me out here. Now I need to find out the duplicates and merge two data sets for these duplicates while discarding the unmatched ones. I used the following script to find out the duplicates, and it indeed found the common keys.
      foreach (do { my %matcher; @matcher{map lc,keys %hash1}=(); grep exists $matcher{lc $_},keys %hash2; }) { print "$_ matches\n"; }
      yet I would like to merge the two data sets, so am I doing totally wrong by bothering with the matches as an extra loop? is there a way to merge two hashes conditional on common keys?
Re^3: Hash w/ multiple values + merging
by biohisham (Priest) on Feb 07, 2010 at 22:56 UTC
    what you are seeing is a reference to the data held under that key, so "ARRAY(0x183d294)" for example is the location where the value associated with "ATA" is stored, to access that value you need to dereference it using the appropriate dereferencers, read the links at the bottom of my previous reply.

    Since the reference type in this case is of an ARRAY something like "@$hash{ATA}}" would show you the values associated with "ATA", to access them one at a time you can specify indices like you do any regular arrays; $hash{ATA}[0] would print the first element of the anonymous array associated to the key "ATA"..

    The module Data::Dumper would show you the data structures stringified so that you could judge if they look like you expected them before proceeding any further...

    #ADD this to the previous code... use Data::Dumper; print Data::Dumper->Dump([\%hash1],['FIRST HASH']),"\n"; print Data::Dumper->Dump([\%hash2],['SECOND HASH']),"\n"; print Data::Dumper->Dump([\%hash3],['MERGED HASH']),"\n";

    Note also that there can be more than one way to do it which would become clearer when you start dealing with more complex data structures..


    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.
      Yes, it worked just fine. Thank you, biohisham! Could you also please guide me for the merge operation? What I have in mind is the following: 1. Start iterating over the keys of hash1 2. Check if any of these keys match with the keys of hash2 3. if matched, push values %hash2 into values%hash1 Is this the right way to do it?
        That way is correct, I don't know however if it is the best way out there so I am not sure if it is computationally expensive or optimal... Anyways, take a look at the code I posted once again for I've already added this information using the same idea !!!...