The next is very close to your code, but kind of works. You only need to format the output as desired (I don't know where the large numbers come from):
For the format, I propose to store the results in an array, and print "mismatch:" only if the array isn't empty at the end.#!/usr/bin/perl open(FILE,"align.input") or die "can not open file: $!"; while($var=<FILE>){ if($var=~/^sxoght:/){ ($str1,$str2)=(); @ar=split(/\s+/,$var); print ">$ar[2]\t$ar[8]\t$ar[9]\t$ar[1]\t$ar[5]\t$ar[10]\t$ar[6 +]\t$ar[3]\t$ar[4]\t$ar[11]\n"; } if($var=~/^Query:/){ $str1=$var; $str1=~s/^Query:\s+//g; $str1=~s/\d+\s+//g; $str1=~s/\s+//g; } if($var=~/^Sbjct:/){ $str2=$var; $str2=~s/^Sbjct:\s+//g; $str2=~s/\d+\s+//g; $str2=~s/\s+//g; } if(defined $str1 and defined $str2) { for($i=0;$i<=length($str1);$i++) { if(substr($str1,$i,1) ne substr($str2,$i,1)){ # this is not in the desired format, yet print substr($str1,$i,1); print substr($str2,$i,1); print "$i\n"; } } ($str1,$str2)=(); } }
After that modification, the output I get for this file ismy @mismatch; for($i=0;$i<=length($str1);$i++) { if(substr($str1,$i,1) ne substr($str2,$i,1)){ push @mismatch, "$i." . substr($str1,$i,1) . substr($s +tr2,$i,1); } } if(@mismatch) { print "mismatch: @mismatch\n"; }
>hit tstart tend #query qstart matches qend score + probability mismatches >gi|122939163|ref|NM_000165.3| 1595 1630 SNPSTER4_104_308EFAA +XX:1:1:1694:128 1 35 36 -10 1.000000 1 mismatch: 30.GA >gi|113412254|ref|XR_018775.1| 1578 1613 SNPSTER4_104_308EFAA +XX:1:1:1608:94 1 36 36 0 0.090884 0 mismatch: 3.GT 34.TG
p.s. There's a possible speed improvement if you XOR (^) the two strings, you'll get a string of null bytes for where they are the same and non null where they are not:
my $xor = $str1 ^ $str2; while($xor =~ /[^\0]/g) { my $i = pos($xor) - 1; # or: $-[0] push @mismatch, "$i." . substr($str1,$i,1) . substr($str2, +$i,1); }
In reply to Re^3: match and mismatch
by bart
in thread match and mismatch
by heidi
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |