in reply to Filtering matches of near-perfect-matched DNA sequence pairs
I'm not up on terms in biology. If your "starting material is 2 sequences, each 10 nucleotides long" does that mean that both inputs should be 10 characters, and must contain A,C,G,T? If so, then I'd start out like this:
By putting the logic into a subroutine, you can modularize it, work out the specs and unit tests for its intended behaviors, and make it reusable (not bound up inside any single larger script). You can work out what sorts of values it needs to return to any given caller, in order to make it easier for the caller to do its job with the result.use strict; use Carp; sub compare_TSD { my %tsd; my ( $tsd{up}, $tsd{dn} ) = @_; croak "compare_TSD called without two defined values" unless ( defined $tsd{up} and defined $tsd{dn} ); my $return_status = ''; for my $arg ( qw/up dn/ ) { my $str = $arg; $str .= " has unusable characters," if ( $tsd{$arg} =~ /[^ACGT +]/ ); $str .= " has wrong character count," if ( length( $tsd{$arg} +!= 10 ); $return_status .= " $str" if ( $str ne $arg ); } return $return_status if ( $return_status ne '' ); return "perfect match" if ( $tsd{up} eq $tsd{dn} ); # not sure what else needs to be checked... }
|
|---|