Is your input a given or is %hash something you created?

In either case, I think you will find this problem much easier to solve if you store the groups and their members in an HoH (hash of hashes). By storing the list of genes in a string you are making the comparison much harder. To do any comparison you have to parse the string into individual genes and then search one group using the the genes from another. If the gene lists were stored in hashes, then most of this work would be done for you.

A hash of hashes for your data would look like this:

my %hGroups = ( 'Group1' => { 'ATRG7' => 1, 'ATG2' => 1, 'ATG4' => 1, 'ATG1' => 1 }, 'Group3' => { 'FYCO1' => 1, 'LSM2' => 1 }, 'Group2' => { 'ATG9' => 1, 'ATG1' => 1 } );

If your data is given to you using the comma delimited gene lists, you will need to convert %hash. You can use map, split, and keys to do the conversion:

my %hGroups = map { my $sGenes = $hash{$_}; my $hGroupMembers = { map { $_ => 1 } split(',', $sGenes) }; $_ => $hGroupMembers; } keys %hash;

Once you have your data in hash of hash form regrouping data can be done easily with the help of exists. In this code sample, %hGroups is the hash of hash above. %hNewGroups will store the new groupings:

my %hNewGroups; oldgroup: foreach my $sGroup (keys %hGroups) { my $hGroupMembers = $hGroups{$sGroup}; # check each new group for genes in common with # current old group ($sGroup) foreach my $sNewGroup (keys %hNewGroups) { # check genes in the old group to see if any are # in the new group ($sNewGroup). # Note: use exists to prevent auto-vivification # (automatic adding) of $sGene to the members hash my $hNewGroupMembers = $hNewGroups{$sNewGroup}; foreach my $sGene (keys %$hGroupMembers) { if (exists($hNewGroupMembers->{$sGene})) { $hNewGroups{$sNewGroup} = { %$hNewGroupMembers , %$hGroupMembers }; next oldgroup; } } } # create a new group, since no gene is in common with # other groups found so far $hNewGroups{$sGroup} = $hGroupMembers; }

The contents of %hNewGroups will be something like this:

%hNewGroups = ( 'Group1' => { 'ATG9' => 1, 'ATG2' => 1, 'ATRG7' => 1, 'ATG4' => 1, 'ATG1' => 1 }, 'Group3' => { 'FYCO1' => 1, 'LSM2' => 1 } );

You can always get back to comma delimited lists later on by using code like this:

while (my ($sNewGroup,$hMembers) = each(%hNewGroups)) { print "$sNewGroup: " . join(',', keys %$hMembers) . "\n"; }

which prints out

Group1: ATG9,ATG2,ATRG7,ATG4,ATG1 Group3: FYCO1,LSM2

Best, beth


In reply to Re: Should I use a hash for this? by ELISHEVA
in thread Should I use a hash for this? by awos22

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.