Re: Filtering matches of near-perfect-matched DNA sequence pairs

I don't understand your list of conditions, and it's hard for me to tell how your code relates to them. Could you add some examples that meet and fail the various conditions?

I'm not up on terms in biology. If your "starting material is 2 sequences, each 10 nucleotides long" does that mean that both inputs should be 10 characters, and must contain A,C,G,T? If so, then I'd start out like this:

use strict;
use Carp;

sub compare_TSD {
    my %tsd;
    my ( $tsd{up}, $tsd{dn} ) = @_;
    croak "compare_TSD called without two defined values"
        unless ( defined $tsd{up} and defined $tsd{dn} );

    my $return_status = '';
    for my $arg ( qw/up dn/ ) {
        my $str = $arg;
        $str .= " has unusable characters," if ( $tsd{$arg} =~ /[^ACGT
+]/ );
        $str .= " has wrong character count," if ( length( $tsd{$arg} 
+!= 10 );
        $return_status .= " $str" if ( $str ne $arg );
    }
    return $return_status if ( $return_status ne '' );

    return "perfect match" if ( $tsd{up} eq $tsd{dn} );
    # not sure what else needs to be checked...
}
[download]

By putting the logic into a subroutine, you can modularize it, work out the specs and unit tests for its intended behaviors, and make it reusable (not bound up inside any single larger script). You can work out what sorts of values it needs to return to any given caller, in order to make it easier for the caller to do its job with the result.

Comment on Re: Filtering matches of near-perfect-matched DNA sequence pairs Download Code