G'day FIJI42,

I wrote in "Re: Identifying Overlapping Matches in Nucleotide Sequence":

"Biological data are typically huge. For reasons of efficiency, when dealing with this type of data, you should choose a fast solution over a slower one. Perl's string handling functions ... are measurably faster than regexes ..."

Here's a solution that uses the string handling functions length and substr (no regexes are used at all):

#!/usr/bin/env perl -l use strict; use warnings; my @dna_seqs = qw{ATGCCCGTAC GCTTCCCAGCGC}; print "$_ => ", dna_prot_map($_) for @dna_seqs; { my %code; BEGIN { %code = qw{ATG M CCC P GTA V GCT A TCC S CAG Q CGC R} } sub dna_prot_map { join '', map $code{substr $_[0], $_*3, 3}, 0..length($_[0])/3- +1 } }

Output:

ATGCCCGTAC => MPV GCTTCCCAGCGC => ASQR

Notes:

My %code is just a subset of your %genetic_code: it only has the data required for your example sequences. You will still need all the data; you can save yourself some typing by omitting the 128 single quotes around all the keys.

You can use state within your subroutine (if you're using Perl version 5.10 or higher); although, be aware that limits the scope. I often find that when I write code like:

sub f { state $static_var = ... ... do something with $static_var here ... }

instead of like:

{ my $static_var; BEGIN { $static_var = ... } sub f { ... do something with $static_var here ... } }

I subsequently find I need to share $static_var with another routine. This requires a major rewrite which ends up looking very much like the version with BEGIN:

{ my $static_var; BEGIN { $static_var = ... } sub f { ... do something with $static_var here ... } sub g { ... do something with $static_var here ... } }

Just having to add 'sub g { ... }' to existing code is a lot less work and a lot less error-prone.

How you choose to do it is up to you: I'm only providing advice of possible pitfalls based on my experience.

— Ken


In reply to Re: Translation Substring Error by kcott
in thread Translation Substring Error by FIJI42

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.