in reply to mismatching characters in dna sequence

This isn't built around PDL, but I'd be surprised if there's not a simple way to adapt it to work with your PDL objects.

use strict; use warnings; my $target = "ATTCCGGG"; my @tests = qw/ ATTGCGGG ATACCGGC /; foreach my $test ( @tests ) { print "Testing: Target: $target\tTest: $test\n"; my @results = map { my ( $target_c, $test_c ) = ( substr( $target, $_, 1 ), substr( $test, $_, 1 ) ); ( $target_c ne $test_c ) ? [ $_, $target_c, $test_c ] : (); } 0 .. length( $target ) - 1; print "\t$_->[0]: $_->[1] -> $_->[2]\n" for @results; }

Output:

Testing: Target: ATTCCGGG Test: ATTGCGGG 3: C -> G Testing: Target: ATTCCGGG Test: ATACCGGC 2: T -> A 7: G -> C

Dave

Replies are listed 'Best First'.
Re^2: mismatching characters in dna sequence
by prbndr (Acolyte) on Dec 30, 2011 at 02:16 UTC

    ^^how fast would this be for millions of sequences? also, here's my PDL code:

    #!/opt/local/bin/perl use Bio::Seq; use Bio::DB::Sam; use PDL; use PDL::Char; $PDL::SHARE = $PDL::SHARE; # Load bam file my $bamfile = Bio::DB::Sam->new( -bam => "eg1.bam" ); my @allReads = $bamfile->features( -type => 'match' ); for my $read (@allReads) { # Get read and target sequences ( $ref, $matches, $query ) = $read->padded_alignment; my $target = PDL::Char->new("$ref"); for my $str ("$query") { my $match = PDL::Char->new($str); my @diff = which( $match != $target )->list; # Print read name and mismatching positions print $read->qname,': ', "@diff\n"; } }
Re^2: mismatching characters in dna sequence
by prbndr (Acolyte) on Dec 30, 2011 at 03:26 UTC

    these are actually quite slow for even 10,000 comparisons. i think it has to do with the use of substr. any suggestions on how to scale up the performance?