comment on

I wrote in "Re: Identifying Overlapping Matches in Nucleotide Sequence":

"Biological data are typically huge. For reasons of efficiency, when dealing with this type of data, you should choose a fast solution over a slower one. Perl's string handling functions ... are measurably faster than regexes ..."

Here's a solution that uses the string handling functions length and substr (no regexes are used at all):

#!/usr/bin/env perl -l

use strict;
use warnings;

my @dna_seqs = qw{ATGCCCGTAC GCTTCCCAGCGC};

print "$_ => ", dna_prot_map($_) for @dna_seqs;

{
    my %code;

    BEGIN {
        %code = qw{ATG M CCC P GTA V GCT A TCC S CAG Q CGC R}
    }

    sub dna_prot_map {
        join '', map $code{substr $_[0], $_*3, 3}, 0..length($_[0])/3-
+1
    }
}
[download]

Output:

ATGCCCGTAC => MPV
GCTTCCCAGCGC => ASQR
[download]

Notes:

My %code is just a subset of your %genetic_code: it only has the data required for your example sequences. You will still need all the data; you can save yourself some typing by omitting the 128 single quotes around all the keys.

You can use state within your subroutine (if you're using Perl version 5.10 or higher); although, be aware that limits the scope. I often find that when I write code like:

sub f {
    state $static_var = ...

    ... do something with $static_var here ...
}
[download]

instead of like:

{
    my $static_var; BEGIN { $static_var = ... }

    sub f { ... do something with $static_var here ... }
}
[download]

I subsequently find I need to share $static_var with another routine. This requires a major rewrite which ends up looking very much like the version with BEGIN:

{
    my $static_var; BEGIN { $static_var = ... }

    sub f { ... do something with $static_var here ... }

    sub g { ... do something with $static_var here ... }
}
[download]

Just having to add 'sub g { ... }' to existing code is a lot less work and a lot less error-prone.

How you choose to do it is up to you: I'm only providing advice of possible pitfalls based on my experience.

— Ken

In reply to Re: Translation Substring Error by kcott
in thread Translation Substring Error by FIJI42

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


We don't bite newbies here... much
	PerlMonks