Hi All, I understand this should need just a minor tweak, But I couldn't really come up with some sensible idea.
Wat I would really want is to find the position of the mismatch(and the actual mismatch) between the query and the Subject in the Inputfile

I have attached my code and the Input file that it needs. My code actually works fine if the line that it works on has got jus fifteen character(including spaces) that are to be removed(in the beginning of the alignment ie, Sbjct\s+\d{5} ). But in some cases, there are sixteen characters where it fails and gives a wrong position of mismatch!

For example,
Case1 with 15characters(/&)spaces:(Start from Sbjt to the end of the numbers before the alignment starts)
Query 10550 CTTGGTTAGTACTGAATCCCATATATACTATGTTTTTCCTATACATATGTACTTAT +GATA 10609 ||||| |||||||||||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 74391 CTTGGATAGTACTGAATCCCATATATACTATGTTTTTCCTATACATATGTACTTAT +GATA 74332
SInce I have got the substr to parse at 15. If I have got 16 character /spaces.
Example
Query 16319 CCCACTCGGGCCCGGCTCCAGCTCCTGCACCGCCTGGGCCAGCCTCCGCATGTTA +AGGGC 16378 ||||||||||||| |||||||||||||||||||||||||||||| |||||||||| +||||| Sbjct 140831 CCCACTCGGGCCCCGCTCCAGCTCCTGCACCGCCTGGGCCAGCCACCGCATGTTA +AGGGC 14077
in here the Sbjct has got 6 digit long number and hence the alignment is moved by a space. It starts from 16th position, rather than 15th in the previous. This results in wrong positions.
My code
my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); my $home = $ENV{'HOME'}; my($ID, $query, $off, $idi, $subject, $ref, $st); print "ID\tposition\tvariation\tRef Genome coordinates\n"; unless(open DATA, "Input_files/Contig_Alignment_Selected_3.txt"){die " +Cannot open the file file $! \n";} while(<DATA>) { chomp; if(m[^>]) { #Checks the start of the alignements ($ID) = (split '\|',$_)[1];#splits the first line with '|' ($ref) = $ID =~ /(\d+)\s+ref$/; } if(/^\s+Identities/){ #gets the percentage of identity my($identity, undef) = split/,/ ; ($idi) = $identity =~ /\sIdentities\s\=\s\d{3}\/\d{3}\s\((\d{2,3}\% +)\)$/; } if(/^\s+Strand/){ #check strands Plus/Minus ($st) = $_ =~/^\s\w+\=\w{4}\/(\w{4,5})$/; } if(m/^Query/) { ($query) = m[^Query\s+(\d+)]; my $top = substr $_, 15;#substring the first 15 char my $pipes = substr <DATA>,15; #same,if the Sbjct is more than 5 num +bers then this doesnt worx my $subject = <DATA>; my($value) = $subject =~ /^Sbjct\s+(\d+)/; my $bot = substr $subject, 15;#if the Sbjct is more than 5 numbers +then this doesnt work my $p = 0 ; while ($p = 1+index $pipes,' ', $p) { my $pos1 = $value-$p; my $pos2 = $value+$p; my $var1 = substr( $top, $p-1, 1 ); my $var2 = substr( $bot, $p-1, 1 ); # my $genomref1 = 4900000 + $pos1; my $genomref2 = 4899999 + $pos2; if($st eq "Minus") { print join"\t", $ref,$pos1, $var1."/".$var2,$genomref2 ; snpdetails($genomref2); }else{ print join "\t", $ref,$pos2, $var1."/".$var2,$genomref2; snpdetails($genomref2); } } } #}


Input file
BLASTN 2.2.24+ Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. RID: 5ZHMGK7311R Query= NODE_16_length_35408_cov_15.061031 Length=35478 Sco +re E Sequences producing significant alignments: (Bit +s) Value lcl|14079 ref|NC_000009.11|:4900000-5300000 Homo sapiens chro... 1.6 +55e+04 0.0 ALIGNMENTS >lcl|14079 ref|NC_000009.11|:4900000-5300000 Homo sapiens chromosome 9 +, GRCh37 primary reference assembly Length=400001 Score = 1.655e+04 bits (8960), Expect = 0.0 Identities = 9014/9037 (99%), Gaps = 15/9037 (0%) Strand=Plus/Minus Query 10190 TGGAGTGCAGTGGCGCAATCTCGGCTCACTGCAAGCATCGCCTCCTGGGTTCACGC +CATT 10249 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 74751 TGGAGTGCAGTGGCGCAATCTCGGCTCACTGCAAGCATCGCCTCCTGGGTTCACGC +CATT 74692 Query 10250 CTCCTGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCATCTGCCACCATGCCCCA +CTAA 10309 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 74691 CTCCTGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCATCTGCCACCATGCCCCA +CTAA 74632 Query 10310 ttttttctattttttAGTAGAGACGGGGTTTCACCATGTTAGCCAGGATGGTCTCG +ATCT 10369 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 74631 TTTTTTCTATTTTTTAGTAGAGACGGGGTTTCACCATGTTAGCCAGGATGGTCTCG +ATCT 74572 Query 10370 CCTGACCTCGTGATCCGCCCACCTCAGCCTCCCAAAGTGCTGGGATTACAGGCGTG +AGCC 10429 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 74571 CCTGACCTCGTGATCCGCCCACCTCAGCCTCCCAAAGTGCTGGGATTACAGGCGTG +AGCC 74512 Query 36624 aTGTTTTGAGCATATAGGGAAAATTTATAAAAATTGGCCATGATGaaacataagc +tcaaa 36683 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| +||||| Sbjct 100670 ATGTTTTGAGCATATAGGGAAAATTTATAAAAATTGGCCATGATGAAACATAAGC +TCAAA 100611 Query 36684 aagtttaaaaagaaaactcctaaaagttggcataacaaagcctaaaaaTCATTTC +AAACT 36743 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| +||||| Sbjct 100610 AAGTTTAAAAAGAAAACTCCTAAAAGTTGGCATAACAAAGCCTAAAAATCATTTC +AAACT 100551 Query 36744 TGGTATAACTGTTACTAGAAAACCATCTACACAATGACTATATATATGCCTTTAT +TTCAT 36803 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| +||||| Sbjct 100550 TGGTATAACTGTTACTAGAAAACCATCTACACAATGACTATATATATGCCTTTAT +TTCAT 100491 Query 36804 TTTTATGTTACGCTTCTCTTTATATTTGAATCATTCCTTTAAACTACATAAACAT +TTTCA 36863 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| +||||| Sbjct 100490 TTTTATGTTACGCTTCTCTTTATATTTGAATCATTCCTTTAAACTACATAAACAT +TTTCA 100431 Query 36864 AGTGTTTGTAAATACCCTTTTAAAAATTACTGCTGTTAGCTGTTCTTCATGATTT +TCTTA 36923 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| +||||| Sbjct 100430 AGTGTTTGTAAATACCCTTTTAAAAATTACTGCTGTTAGCTGTTCTTCATGATTT +TCTTA 100371 Query 36924 CTGGTCTCCTTACACATTCGAAATTGGACATTTCCGACTATTTCCTTGGTATGTT +TTATA 36983 ||||||||||||||||||||||||||||||||||||||||||||||||||||||| +||||| Sbjct 100370 CTGGTCTCCTTACACATTCGAAATTGGACATTTCCGACTATTTCCTTGGTATGTT +TTATA 100311

All I need is to be able to get the right position on either case , 15/16!(Sbjct 7457
1 /Sbjct 100370 )
I appreciate all your help and suggestion Thanks in advance for your time.
Regards

In reply to finding the position by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.