ic23oluk has asked for the wisdom of the Perl Monks concerning the following question:

hello monks,

I try to read the RNA triplet code into a hash, where the triplets are keys, and the one letter code of amino acids are the values. I read the information from an txt file that has the following structure: :

X whitespace codon_1 codon_2 ...

. . .

X stands for the Letter of the amino acid

Here's my code

my %code; my $file = 'code.txt'; open (READ, $file) || die "Cannot open $file: $!\n"; while (my $line = <READ>){ chomp $line; if ($line =~ /^(\w+)\s+([\w]+)$/i){ (%code) = ( "$2" => $1 ); next; } if ($line =~ /^(\w+)\s+([\w]+)\s+([\w]+)$/i){ (%code) = ( "$2" => $1, "$3" => $1 ); next; } if ($line =~ /^(\w+)\s+([\w]+)\s+([\w]+)\s+([\w]+)$/i){ (%code) = ( "$2" => $1, "$3" => $1, "$4" => $1 ); next; } if ($line =~ /^(\w+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)$/i){ (%code) = ( "$2" => $1, "$3" => $1, "$4" => $1, "$5" => $1 ); next; } if ($line =~ /^(\w+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\ +w]+)$/i){ (%code) = ( "$2" => $1, "$3" => $1, "$4" => $1, "$5" => $1, "$6" => $1 ); next; } if ($line =~ /^(\w+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\w]+)\s+([\ +w]+)\s+([\w]+)$/i){ (%code) = ( "$2" => $1, "$3" => $1, "$4" => $1, "$5" => $1, "$6" => $1, "$ +7" => $1 ); next; } } foreach (keys %code){ print $_, "\t", $code{"$_"}, "\n"; }

the output i get is just the last line (UAA, UAG, UGA Stop). Could anyone indicate the problem?

thanks in advance

Replies are listed 'Best First'.
Re: reading RNA codons into hash
by choroba (Cardinal) on Jul 13, 2017 at 09:48 UTC
    By assigning to the whole hash, you're overwriting it for each line.
    %hash = (key => 'value'); # Removes previous contents of %hash.

    Assign just to the value corresponding to a key:

    $hash{key} = 'value';

    In your case, it's

    $code{$2} = $1; $code{$3} = $1; ...

    Update:

    Also, [\w] can be written as just \w . Using /i only make sense if your regex contains letters, yours only contains \w and '\s'. Moreover, you can probably simplify the whole program (I'm just guessing as you haven't provided any sample data) to

    while (my $line = <READ>) { my ($acid, @codons) = split ' ', $line; $code{$_} = $acid for @codons; }

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Thank you very much!
Re: reading RNA codons into hash
by QM (Parson) on Jul 13, 2017 at 10:00 UTC
    Not sure what your problem is, but that's way to much code :D

    I think you want something like this:

    #!/usr/bin/env perl use strict; use warnings; our %code; while (<>){ chomp; my @tokens = split " ", $_; my $protein = shift @tokens; for my $token (@tokens) { $code{$token} = $protein; } } foreach my $key (sort keys %code){ print "$key\t$code{$key}\n"; }

    Using this input file:

    A ABC DEF GHI JKL B QRS TUV WXY C EYE LUV YOU

    I get this result:

    ABC A DEF A EYE C GHI A JKL A LUV C QRS B TUV B WXY B YOU C

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Thank you, too!
Re: reading RNA codons into hash
by AnomalousMonk (Archbishop) on Jul 13, 2017 at 10:35 UTC

    This is the sort of thing I think should be encapsulated in a module rather than in a funky old .txt file you have to parse every time you use it. Here's an example I put together in another context. It's directed to DNA rather than RNA (I think ... I'm not a BioMonk), but you should be able to get the general idea.

    File: CodonToAmino.pm:


    Give a man a fish:  <%-{-{-{-<

      Strange but true: there isn't just one genetic code, there are several depending on the organism; see list of genetic codes. There are also some additional weird variants due to RNA editing that happens in certain situations.