Should I use a hash for this?

awos22 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Should I use a hash for this? by jettero (Monsignor) on Apr 15, 2009 at 15:13 UTC
This is something I'd probably use Set::Scalar for. It does all the set stuff for you. `$hash{g1} = Set::Scalar->new(qw(ATG1 ATG2 ATG4 ATRG7)); $hash{g2} = Set::Scalar->new(qw(ATG1 x y z)); if( $hash{g1}->intersection($hash{g2}) ) { print "merging g1 and g2\n"; $hash{g1}->insert(delete $hash{g2}) }` [download] Not sure if that really answers the question though. -Paul	[reply] [d/l]
Re: Should I use a hash for this? by targetsmart (Curate) on Apr 15, 2009 at 16:19 UTC
It will be be good to see what you have tried in first place but I won't stop with that Here is one way of doing it input hash `my %hash= ( Group1 => 'ATG1,ATG2,ATG4,ATRG7', Group2 => 'ATG1,ATG9', Group3 => 'FYCO1,LSM2', Group4 => 'ATG1,MAM2', Group5 => 'LSM2', );` [download] code use strict; use warnings; use Data::Dumper; my %hash= ( Group1 => 'ATG1,ATG2,ATG4,ATRG7', Group2 => 'ATG1,ATG9', Group3 => 'FYCO1,LSM2', Group4 => 'ATG1,MAM2', Group5 => 'LSM2', ); my (%newhash,$i,%combinegroups,%splittedhash); foreach my $key (keys %hash){ my @values ; if($hash{$key} =~ /,/){ @values = split(/,/,$hash{$key}) ; $splittedhash{$key} = [ @values ]; }else{ push(@values,$hash{$key}); $splittedhash{$key} = [ @values ]; } if(++$i == 1) { my @newvalues; push (@newvalues,$key) for(0..scalar @values); @newhash{@values} = @newvalues; }else{ foreach (@values){ if(exists $newhash{$_}){ $combinegroups{$newhash{$_}.":".$key}++; }else{ $newhash{$_} = $key; } } } } foreach my $key (keys %combinegroups){ my ($firstgroup,$secondgroup) = split(/:/,$key); my %unique; @unique{@{$splittedhash{$firstgroup}},@{$splittedhash{$secondgroup +}}} = (); $hash{$firstgroup} = join(',',keys %unique); @{$splittedhash{$firstgroup}} = keys %unique; delete $hash{$secondgroup}; } print "FINAL HASH\n",Dumper \%hash; [download] output `FINAL HASH $VAR1 = { 'Group4' => 'ATG9,ATG2,ATRG7,ATG4,ATG1,MAM2', 'Group5' => 'FYCO1,LSM2' };` [download] but, this code will behave wrongly for a condition, if value1 in group1 match with group2 and at the same time value2 in group1 match with group3. might have some other bugs also but to start with, this code will work, the rest is upto you I haven't worried about efficiency in my code, it can be written even more effectively!, but this will work for your this case if a value from 1 group (key) is found as a value from another group then I want to combine the values from both of those groups into a single set of values Vivek -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.	[reply] [d/l] [select]
Re: Should I use a hash for this? by ELISHEVA (Prior) on Apr 15, 2009 at 20:00 UTC
Is your input a given or is `%hash` something you created? In either case, I think you will find this problem much easier to solve if you store the groups and their members in an HoH (hash of hashes). By storing the list of genes in a string you are making the comparison much harder. To do any comparison you have to parse the string into individual genes and then search one group using the the genes from another. If the gene lists were stored in hashes, then most of this work would be done for you. A hash of hashes for your data would look like this: Read more... (4 kB) Best, beth	[reply] [d/l] [select]
Re^2: Should I use a hash for this? by awos22 (Initiate) on Apr 16, 2009 at 19:29 UTC
Thanks to all of you for helping me out with this! I'm a little embarrassed that I didn't check my thread before now...but I was trapped into doing other things grant related (have to pay the bills somehow). Anyway, now I can hopefully get back to the more fun stuff, which is of course writing Perl scripts. :) I will implement your various suggestions and see which one works the best and can handle some of the more complicated data that I will need to analyze. I just wanted to post something right away to let you all know how grateful I am for the amazing suggestions - you guys are awesome. I'll post something later to let you know how it all goes. By the way, nice job calling them genes Beth, I was wondering if anyone would notice. ;) -Mat	[reply]
Re: Should I use a hash for this? by jrsimmon (Hermit) on Apr 15, 2009 at 17:05 UTC
Need some more information. How would these hashes be handled? `%hash= ( Group1 => 'ATG1,ATG2,ATG4,ATRG7', Group2 => 'ATG1,ATG9', Group3 => 'FYCO1,LSM2', Group4 => 'ATG4,ABC' );` [download] `%hash= ( Group1 => 'ATG1,ATG2,ATG4,ATRG7', Group2 => 'ATG1,ATG9', Group3 => 'FYCO1,LSM2', Group4 => 'ATG4,LSM2' );` [download]	[reply] [d/l] [select]