I have the following input file. The input file include pairs of two id's separated by tab. Each line with a pair is a combination and has relevance.

DATA

YP_01 NP_02

NP_02 YP_01

YP_01 NP_03

NP_03 YP_01

NP_02 NP_03

NP_03 NP_02

NP_04 NP_05

.....

Here is the code which I am using to get the reciprocal best hit pair from this input file. For example, in above input file, YP_01\tNP_02 and NP_02\tYP_01, is reciprocal best hit pair. And NP_04\tNP_05 is not reciprocal hit pair as it lacks the reciprocal combination in the file i.e. NP_05\tNP_04.

open (DATA, $results) || die "couldn't open the file!"; my %unique = (); while( <DATA> ){ chomp; $unique{ join( "\t", sort split /\t/ ) } ++; } my @pairs = (); my @not_pairs = (); for my $item (sort keys %unique ){ if( $unique{$item} > 1 ){ push @pairs, $item; }else{ push @not_pairs, $item; } } print join( "\n", @pairs ), "\n";

This gives the following result file

YP_01 NP_02

YP_01 NP_03

NP_02 NP_03

The problem is now I need to modify the code such that for above result file it produces the following result file instead

YP_01 NP_02

YP_01 NP_03

3

The idea is if YP_01 is paired with NP_02, and NP_03 and if NP_02 is paired with NP_03, then I do not need the pair NP_02 and NP_03 printed in the output file as their info already there in combination with YP_01. But I do need to know whether that pair NP_02 and NP_03 existed or not and that is why I need towards the end a #. As shown above I have "3" here in the end representing that the three best hit pairs are present (YP_01 and NP_02; YP_01 and NP_03; NP_02 and NP_03). If NP_02 and NP_03 was not present, then the number should change to 2 showing that out of 3 only 2 comparison are there.

could anyone help me with this. please let me know if you need further clarification


In reply to help needed in modifying the code for counting possible combinations by BhariD

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.