comment on

hi monks,

i'm still relatively new at perl but getting better and better with practice. my question for now is how to compare two strings (of the same length of let's say about 40 characters) and obtain the counts of potentially mismatching characters. take, for example, the following two strings:

target = ATTCCGGG str1 = ATTGCGGG str2 = ATACCGGC

i would like to compare str1 to "target" and str2 to "target" and count the mismatch type at the mismatching position. comparing str1 to target gives one mismatch at position 3 which is a C->G. comparing str2 to target gives two mismatches at positions 2 and 7 which are T->A and G->C respectively. is there an efficient way to do this for millions of different targets and strings?

i have the following code using PDL:

use PDL; 
use PDL::Char;                                                        
+          
$PDL::SHARE=$PDL::SHARE; # keep stray warning quiet 

my $source=PDL::Char->new("ATTCCGGG");                                
+          
for my $str ( "ATTGCGGG") {                                         
  my $match =PDL::Char->new($str);                                    
+          
  my @diff=which($match!=$source)->list;                              
+          
  print "@diff\n";                                                    
+          
}
[download]

this code doesn't give me the specific types of mismatches though. an A,T,G, or C in the target can transform into an A,T,G, or C in the strings, so i would like to keep track of these conversions. any advice?

Original content restored above by GrandFather

oops...i deleted my post...

i'm trying to find the positions and types of differences between two strings. take the "target" and the strings:

$target = "ATTCCGGG"; $str1 = "ATTGCGGG"; # 1 mismatch with target at position 3 (C->G) $str2 = "ATACCGGC"; # 2 mismatches with target at position 2 and 7 (T->A and G->C)

how do i go about obtaining the differences between millions of targets and strings in an efficient manner? i have the following code using PDL:

use PDL; 
use PDL::Char;                                                        
+          
$PDL::SHARE=$PDL::SHARE; # keep stray warning quiet 

my $source=PDL::Char->new("ATTCCGGG");                                
+          
for my $str ( "ATTGCGGG", "ATACCGGC") {                               
+          
  my $match =PDL::Char->new($str);                                    
+          
  my @diff=which($match!=$source)->list;                              
+          
  print "@diff\n";                                                    
+          
}
[download]

but this only gives me positions. how do i look for the actual conversions that occur too? an A,T,C,G can convert to an A,T,C, or G respectively. any advice?

In reply to mismatching characters in dna sequence by prbndr

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.