Re^2: Hash w/ multiple values + merging

Replies are listed 'Best First'.
Re^3: Hash w/ multiple values + merging by planetscape (Chancellor) on Feb 07, 2010 at 23:08 UTC
References and How can I visualize my complex data structure? may also help. HTH, planetscape	[reply]
Re^4: Hash w/ multiple values + merging by sophix (Sexton) on Feb 07, 2010 at 23:30 UTC
So this is the script so far: #!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; my ($key,$val); open (FH1, "$ARGV[0]")or die("Error opening File1 $!\n"); my %hash1; while(<FH1>){ next if /(sample\|value\d)/; my ($key, $val1, $val2)= split /\s+/; push @{$hash1{$key}}, ($val1, $val2); } close FH1; open(FH2, "$ARGV[1]")or die("Error opening File1 $!\n"); my %hash2; while(<FH2>){ next if /(sample\|value\d)/; my($key, $val3)=split /\s+/; $hash2{$key}=$val3; } close FH2; my %hash3 = %hash1; foreach my $key2 ( keys %hash2 ) { if( exists $hash3{$key2} ) { push @{$hash3{$key2}}, ($hash2{$key2}); } else { delete $hash3{$key}; } } print Data::Dumper->Dump([\%hash1],['FIRST HASH']),"\n"; print Data::Dumper->Dump([\%hash2],['SECOND HASH']),"\n"; print Data::Dumper->Dump([\%hash3],['MERGED HASH']),"\n"; [download] - It reads into hash correctly - It finds the duplicates and update the value array for duplicate keys - but it does not get rid of the unmatched keys and their values	[reply] [d/l]
Re^3: Hash w/ multiple values + merging by Corion (Patriarch) on Feb 07, 2010 at 22:27 UTC
You're printing out the stringified references. An easy way to inspect data structures is Data::Dumper.	[reply]
Re^4: Hash w/ multiple values + merging by sophix (Sexton) on Feb 07, 2010 at 22:46 UTC
Thanks guys, you are really helping me out here. Now I need to find out the duplicates and merge two data sets for these duplicates while discarding the unmatched ones. I used the following script to find out the duplicates, and it indeed found the common keys. `foreach (do { my %matcher; @matcher{map lc,keys %hash1}=(); grep exists $matcher{lc $_},keys %hash2; }) { print "$_ matches\n"; }` [download] yet I would like to merge the two data sets, so am I doing totally wrong by bothering with the matches as an extra loop? is there a way to merge two hashes conditional on common keys?	[reply] [d/l]
Re^3: Hash w/ multiple values + merging by biohisham (Priest) on Feb 07, 2010 at 22:56 UTC
what you are seeing is a reference to the data held under that key, so "`ARRAY(0x183d294)`" for example is the location where the value associated with "ATA" is stored, to access that value you need to dereference it using the appropriate dereferencers, read the links at the bottom of my previous reply. Since the reference type in this case is of an ARRAY something like "@$hash{ATA}}" would show you the values associated with "ATA", to access them one at a time you can specify indices like you do any regular arrays; `$hash{ATA}[0]` would print the first element of the anonymous array associated to the key "ATA".. The module Data::Dumper would show you the data structures stringified so that you could judge if they look like you expected them before proceeding any further... `#ADD this to the previous code... use Data::Dumper; print Data::Dumper->Dump([\%hash1],['FIRST HASH']),"\n"; print Data::Dumper->Dump([\%hash2],['SECOND HASH']),"\n"; print Data::Dumper->Dump([\%hash3],['MERGED HASH']),"\n";` [download] Note also that there can be more than one way to do it which would become clearer when you start dealing with more complex data structures.. Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.	[reply] [d/l] [select]
Re^4: Hash w/ multiple values + merging by sophix (Sexton) on Feb 07, 2010 at 23:04 UTC
Yes, it worked just fine. Thank you, biohisham! Could you also please guide me for the merge operation? What I have in mind is the following: 1. Start iterating over the keys of hash1 2. Check if any of these keys match with the keys of hash2 3. if matched, push values %hash2 into values%hash1 Is this the right way to do it?	[reply]
Re^5: Hash w/ multiple values + merging by Anonymous Monk on Feb 07, 2010 at 23:11 UTC
That way is correct, I don't know however if it is the best way out there so I am not sure if it is computationally expensive or optimal... Anyways, take a look at the code I posted once again for I've already added this information using the same idea !!!...	[reply]