awos22 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings all. I am new to the site (at least as a registered user), but plan to step up my attendance here as I am writing more and more Perl these days. Here is my question and I hope someone out there has a solution for me.

Here is what I currently have, in a hash:

%hash= ( Group1 => 'ATG1,ATG2,ATG4,ATRG7', Group2 => 'ATG1,ATG9', Group3 => 'FYCO1,LSM2' );

What I would like to do is end up with something like this:
newGroup1 = ATG1,ATG2,ATG4,ATRG7,ATG9
newGroup2 = FYCO1,LSM2

Basically, if a value from 1 group (key) is found as a value from another group then I want to combine the values from both of those groups into a single set of values. I hope that makes sense. I've been working on this for a couple of days now and can't figure a good way to do it. :)

Thanks for any help! Glad to be a part of this site now.
-Mat

Replies are listed 'Best First'.
Re: Should I use a hash for this?
by jettero (Monsignor) on Apr 15, 2009 at 15:13 UTC

    This is something I'd probably use Set::Scalar for. It does all the set stuff for you.

    $hash{g1} = Set::Scalar->new(qw(ATG1 ATG2 ATG4 ATRG7)); $hash{g2} = Set::Scalar->new(qw(ATG1 x y z)); if( $hash{g1}->intersection($hash{g2}) ) { print "merging g1 and g2\n"; $hash{g1}->insert(delete $hash{g2}) }

    Not sure if that really answers the question though.

    -Paul

Re: Should I use a hash for this?
by targetsmart (Curate) on Apr 15, 2009 at 16:19 UTC
    It will be be good to see what you have tried in first place

    but I won't stop with that

    Here is one way of doing it
    input hash

    my %hash= ( Group1 => 'ATG1,ATG2,ATG4,ATRG7', Group2 => 'ATG1,ATG9', Group3 => 'FYCO1,LSM2', Group4 => 'ATG1,MAM2', Group5 => 'LSM2', );
    code
    use strict; use warnings; use Data::Dumper; my %hash= ( Group1 => 'ATG1,ATG2,ATG4,ATRG7', Group2 => 'ATG1,ATG9', Group3 => 'FYCO1,LSM2', Group4 => 'ATG1,MAM2', Group5 => 'LSM2', ); my (%newhash,$i,%combinegroups,%splittedhash); foreach my $key (keys %hash){ my @values ; if($hash{$key} =~ /,/){ @values = split(/,/,$hash{$key}) ; $splittedhash{$key} = [ @values ]; }else{ push(@values,$hash{$key}); $splittedhash{$key} = [ @values ]; } if(++$i == 1) { my @newvalues; push (@newvalues,$key) for(0..scalar @values); @newhash{@values} = @newvalues; }else{ foreach (@values){ if(exists $newhash{$_}){ $combinegroups{$newhash{$_}.":".$key}++; }else{ $newhash{$_} = $key; } } } } foreach my $key (keys %combinegroups){ my ($firstgroup,$secondgroup) = split(/:/,$key); my %unique; @unique{@{$splittedhash{$firstgroup}},@{$splittedhash{$secondgroup +}}} = (); $hash{$firstgroup} = join(',',keys %unique); @{$splittedhash{$firstgroup}} = keys %unique; delete $hash{$secondgroup}; } print "FINAL HASH\n",Dumper \%hash;
    output
    FINAL HASH $VAR1 = { 'Group4' => 'ATG9,ATG2,ATRG7,ATG4,ATG1,MAM2', 'Group5' => 'FYCO1,LSM2' };
    but, this code will behave wrongly for a condition, if value1 in group1 match with group2 and at the same time value2 in group1 match with group3.
    might have some other bugs also
    but to start with, this code will work, the rest is upto you

    I haven't worried about efficiency in my code, it can be written even more effectively!, but this will work for your this case
    if a value from 1 group (key) is found as a value from another group then I want to combine the values from both of those groups into a single set of values


    Vivek
    -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.
Re: Should I use a hash for this?
by ELISHEVA (Prior) on Apr 15, 2009 at 20:00 UTC

    Is your input a given or is %hash something you created?

    In either case, I think you will find this problem much easier to solve if you store the groups and their members in an HoH (hash of hashes). By storing the list of genes in a string you are making the comparison much harder. To do any comparison you have to parse the string into individual genes and then search one group using the the genes from another. If the gene lists were stored in hashes, then most of this work would be done for you.

    A hash of hashes for your data would look like this:

    Best, beth

      Thanks to all of you for helping me out with this!
      I'm a little embarrassed that I didn't check my thread before now...but I was trapped into doing other things grant related (have to pay the bills somehow).

      Anyway, now I can hopefully get back to the more fun stuff, which is of course writing Perl scripts. :)

      I will implement your various suggestions and see which one works the best and can handle some of the more complicated data that I will need to analyze. I just wanted to post something right away to let you all know how grateful I am for the amazing suggestions - you guys are awesome.

      I'll post something later to let you know how it all goes.

      By the way, nice job calling them genes Beth, I was wondering if anyone would notice. ;)

      -Mat

Re: Should I use a hash for this?
by jrsimmon (Hermit) on Apr 15, 2009 at 17:05 UTC
    Need some more information. How would these hashes be handled?
    %hash= ( Group1 => 'ATG1,ATG2,ATG4,ATRG7', Group2 => 'ATG1,ATG9', Group3 => 'FYCO1,LSM2', Group4 => 'ATG4,ABC' );
    %hash= ( Group1 => 'ATG1,ATG2,ATG4,ATRG7', Group2 => 'ATG1,ATG9', Group3 => 'FYCO1,LSM2', Group4 => 'ATG4,LSM2' );