Is your input a given or is %hash something you created?
In either case, I think you will find this problem much easier to solve if you store the groups and their members in an HoH (hash of hashes). By storing the list of genes in a string you are making the comparison much harder. To do any comparison you have to parse the string into individual genes and then search one group using the the genes from another. If the gene lists were stored in hashes, then most of this work would be done for you.
A hash of hashes for your data would look like this:
my %hGroups = ( 'Group1' => { 'ATRG7' => 1, 'ATG2' => 1, 'ATG4' => 1, 'ATG1' => 1 }, 'Group3' => { 'FYCO1' => 1, 'LSM2' => 1 }, 'Group2' => { 'ATG9' => 1, 'ATG1' => 1 } );
If your data is given to you using the comma delimited gene lists, you will need to convert %hash. You can use map, split, and keys to do the conversion:
my %hGroups = map { my $sGenes = $hash{$_}; my $hGroupMembers = { map { $_ => 1 } split(',', $sGenes) }; $_ => $hGroupMembers; } keys %hash;
Once you have your data in hash of hash form regrouping data can be done easily with the help of exists. In this code sample, %hGroups is the hash of hash above. %hNewGroups will store the new groupings:
my %hNewGroups; oldgroup: foreach my $sGroup (keys %hGroups) { my $hGroupMembers = $hGroups{$sGroup}; # check each new group for genes in common with # current old group ($sGroup) foreach my $sNewGroup (keys %hNewGroups) { # check genes in the old group to see if any are # in the new group ($sNewGroup). # Note: use exists to prevent auto-vivification # (automatic adding) of $sGene to the members hash my $hNewGroupMembers = $hNewGroups{$sNewGroup}; foreach my $sGene (keys %$hGroupMembers) { if (exists($hNewGroupMembers->{$sGene})) { $hNewGroups{$sNewGroup} = { %$hNewGroupMembers , %$hGroupMembers }; next oldgroup; } } } # create a new group, since no gene is in common with # other groups found so far $hNewGroups{$sGroup} = $hGroupMembers; }
The contents of %hNewGroups will be something like this:
%hNewGroups = ( 'Group1' => { 'ATG9' => 1, 'ATG2' => 1, 'ATRG7' => 1, 'ATG4' => 1, 'ATG1' => 1 }, 'Group3' => { 'FYCO1' => 1, 'LSM2' => 1 } );
You can always get back to comma delimited lists later on by using code like this:
while (my ($sNewGroup,$hMembers) = each(%hNewGroups)) { print "$sNewGroup: " . join(',', keys %$hMembers) . "\n"; }
which prints out
Group1: ATG9,ATG2,ATRG7,ATG4,ATG1 Group3: FYCO1,LSM2
Best, beth
In reply to Re: Should I use a hash for this?
by ELISHEVA
in thread Should I use a hash for this?
by awos22
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |