Re: mismatching characters in dna sequence

This isn't built around PDL, but I'd be surprised if there's not a simple way to adapt it to work with your PDL objects.


use strict;
use warnings;

my $target = "ATTCCGGG";

my @tests = qw/ ATTGCGGG ATACCGGC /;



foreach my $test ( @tests ) {

    print "Testing: Target: $target\tTest: $test\n";

    my @results = map {
        my ( $target_c, $test_c ) = (
            substr( $target, $_, 1 ),
            substr(   $test, $_, 1 )
        );
        ( $target_c ne $test_c ) ? [ $_, $target_c, $test_c ] : ();
    } 0 .. length( $target ) - 1;

    print "\t$_->[0]: $_->[1] -> $_->[2]\n" for @results;

}
[download]

Output:

Testing: Target: ATTCCGGG       Test: ATTGCGGG
        3: C -> G
Testing: Target: ATTCCGGG       Test: ATACCGGC
        2: T -> A
        7: G -> C
[download]

Dave

Comment on Re: mismatching characters in dna sequence Select or Download Code

Replies are listed 'Best First'.
Re^2: mismatching characters in dna sequence by prbndr (Acolyte) on Dec 30, 2011 at 02:16 UTC
^^how fast would this be for millions of sequences? also, here's my PDL code: #!/opt/local/bin/perl use Bio::Seq; use Bio::DB::Sam; use PDL; use PDL::Char; $PDL::SHARE = $PDL::SHARE; # Load bam file my $bamfile = Bio::DB::Sam->new( -bam => "eg1.bam" ); my @allReads = $bamfile->features( -type => 'match' ); for my $read (@allReads) { # Get read and target sequences ( $ref, $matches, $query ) = $read->padded_alignment; my $target = PDL::Char->new("$ref"); for my $str ("$query") { my $match = PDL::Char->new($str); my @diff = which( $match != $target )->list; # Print read name and mismatching positions print $read->qname,': ', "@diff\n"; } } [download]	[reply] [d/l]
Re^2: mismatching characters in dna sequence by prbndr (Acolyte) on Dec 30, 2011 at 03:26 UTC
these are actually quite slow for even 10,000 comparisons. i think it has to do with the use of substr. any suggestions on how to scale up the performance?	[reply]