in reply to parsing mismatch from blast output
I couldn't find any files of the correct format. I tried blastn at NCBI on NC_000009.11, but after waiting 45 minutes it said: "Informational Message: blastsrv4.REAL: Error: CPU usage limit was exceeded, resulting in SIGXCPU (24)." which probably just goes to show I don't know what the .... I'm doing.
So I based this upon your sample:
#! perl -slw use strict; my( $ID, $query, $off ); while( <DATA> ) { if( m[^>] ) { ( $ID ) = (split '\|', $_)[ 1 ]; next; } if( m[^Query] ) { ( $query ) = m[^Query\s+(\d+)]; my $top = substr $_, 15; my $pipes = substr <DATA>, 15; my $bot = substr <DATA>, 15; my $p = 0; while( $p = 1+index $pipes, ' ', $p ) { printf "%20s :(%d) %1s/%1s\n", $ID, $query+$p, substr( $top, $p-1, 1 ), substr( $bot, $p-1, 1 ); } } } __DATA__
>lcl|14079 ref|NC_000009.11|:4900000-5300000 Homo sapiens chromosome 9 +, GRCh37 primary reference assembly Length=400001 Score = 270 bits (146), Expect = 2e-74 Identities = 148/149 (99%), Gaps = 0/149 (0%) Strand=Plus/Minus Query 1 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT +GACA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 48784 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT +GACA 48725 Query 61 AATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA +GTGA 120 |||||||||||||| ||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 48724 AATGGGATCTAATTCAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA +GTGA 48665 Query 121 ACAGGCAACCTACAGAATGGGAGAACATT 149 ||||||||||||||||||||||||||||| Sbjct 48664 ACAGGCAACCTACAGAATGGGAGAACATT 48636 >lcl|14080 ref|NC_000009.11|:4900000-5300000 Homo sapiens chromosome 9 +, GRCh37 primary reference assembly Length=400001 Score = 270 bits (146), Expect = 2e-74 Identities = 148/149 (99%), Gaps = 0/149 (0%) Strand=Plus/Minus Query 1 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT +GACA 60 |||||||||||||| ||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 48784 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT +GACA 48725 Query 61 AATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA +GTGA 120 ||||||||||||||||||||||||||||| |||||||||||||||||||||||||| +|||| Sbjct 48724 AATGGGATCTAATTCAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA +GTGA 48665 Query 121 ACAGGCAACCTACAGAATGGGAGAACATT 149 ||| |||| |||||| ||||||||||| Sbjct 48664 ACAGGCAACCTACAGAATGGGAGAACATT 48636 >lcl|14081 ref|NC_000009.11|:4900000-5300000 Homo sapiens chromosome 9 +, GRCh37 primary reference assembly Length=400001 Score = 270 bits (146), Expect = 2e-74 Identities = 148/149 (99%), Gaps = 0/149 (0%) Strand=Plus/Minus Query 1 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT +GACA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 48784 TGGGCAAGGACTTCATGTCTAAAACACCAAAAGCAATGGCAACAAAAGCCAAAATT +GACA 48725 Query 61 AATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA +GTGA 120 |||||||||||||| ||||||||||||||||||||||||||||||||||||||||| +|||| Sbjct 48724 AATGGGATCTAATTCAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGA +GTGA 48665 Query 121 ACAGGCAACCTACAGAATGGGAGAACATT 149 ||||||||||||||||||||||||||||| Sbjct 48664 ACAGGCAACCTACAGAATGGGAGAACATT 48636
Outputs:
c:\test\blast>..\853819.pl 14079 ref :(75) A/C 14080 ref :(15) A/A 14080 ref :(90) T/T 14080 ref :(124) G/G 14080 ref :(129) C/C 14080 ref :(136) A/A 14081 ref :(75) A/C
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: parsing mismatch from blast output
by Anonymous Monk on Aug 11, 2010 at 09:34 UTC | |
by Anonymous Monk on Aug 11, 2010 at 14:19 UTC |