Hi there fellow mongers, I am working on a clustering process. I have to parse a file on basis if relativeness. So basically in terms of a example what I want to do is

Input file :example.txt

Gene1,Gene2,spc1,spc2
Gene3,Gene1,spc1,spc2,spc4
Gene4,Gene1,spc1,spc2,spc5,spc3,spc1
Gene2,Gene3,spc1,spc2
Gene2,Gene4,spc2,spc3
Gene3,Gene4,spc1,spc2
GeneA,GeneB,spc4,spc5
GeneB,GeneC,spc1,spc2
GeneC,GeneD,spc1,spc2
GeneD,GeneE,spc4,spc2
GeneE,GeneF,spc3,spc1
GeneX,GeneY,spc6,spc8
GeneX,GeneP,spc6,spc7

My desired Output is
Gene1,Gene2,Gene3,Gene4,spc1,spc2,spc1,spc2,spc4,spc1,spc2,spc5,spc1,spc2,spc2,spc3,spc1,spc2
GeneA,GeneB,GeneC,GeneD,GeneE,GeneF,spc4,spc5,spc1,spc2,spc1,spc2,spc4,spc2,spc3,spc1
GeneX,GeneY,GeneP,,spc6,spc8,spc6,spc7


Currently I am working only on the first half problem .All I am trying to do is get the Gene"X" to cluster.



#!/usr/bin/perl use strict; use warnings; my $file = $ARGV[0]; my %GFs = get_gene_families($file); foreach my $key (keys(%GFs)){ print $key.$GFs{$key}."\n"; } exit; sub get_gene_families{ my $fileName = $_[0]; my %hash = (); open(IFILE, "$fileName") or die "Couldn't open: $fileName\n"; while(my $line = <IFILE>){ my @genes = split(/\,/, $line); my $new = 1; foreach my $key (keys(%hash)){ my @split = split(/\,/, $hash{$key}); push(@split, $key); if(contains($genes[0], \@split) && contains($genes[1], \@split)){ $new = 0; } if(contains($genes[0], \@split) && !contains($genes[1], \@split)){ $hash{$key} .= ",".$genes[1]; $new = 0; } if(!contains($genes[0], \@split) && contains($genes[1], \@split)){ $hash{$key} .= ",".$genes[0]; $new = 0; } } if($new){ $hash{$genes[0]} .= ",".$genes[1]; } } close IFILE; return %hash; } sub contains{ my $target = $_[0]; my @array = @{$_[1]}; foreach my $element (@array){ if($element eq $target){ return 1; } else } return 0; }
But the output I am getting is

Gene1,Gene2,Gene3,Gene4
GeneB,GeneC
Gene2,Gene3,Gene4
GeneE,GeneF
GeneD,GeneE

,
GeneC,GeneD
GeneA,GeneB
Gene3,Gene4
GeneX,GeneY,GeneP



CAN ANY BODY PLEASE HELP I have been stuck with this problem for a over two weeks now. And I have not yet managed to deal with the first half of the problem.

In reply to Clustering with Perl by nerve

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.