update: DNA, RNA what's the difference? My original
code used the cDNA, not the mRNA. I changed it and reran it,
and everyone's code now works except for japhy's.
OK, this is going to be a long one...
I was going to benchmark these golf examples to see which one
was fastest, but there seems to be some cheating going on.
Honestly, I don't really understand what any of these is doing,
so I don't know if the cheating was intentional or not. To
do the benchmarking, was was going to use the
CFTR mRNA (that
is the protein that, when mutated, causes cystic fibrosis).
The mRNA (with leading and trailing sequence removed)
is in the __DATA__ section of the code. The correct
translation looks like this:
MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRELASKKNPKLI
+NALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIAIYLGIGLCLLFIVRTLLLH
+PAIFGLHHIGMQMRIAMFSLIYKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFVWIAPLQV
+ALLMGLIWELLQASAFCGLGFLIVLALFQAGLGRMMMKYRDQRAGKISERLVITSEMIENIQSVKAYCW
+EEAMEKMIENLRQTELKLTRKAAYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIILRKIFTTISFCIVL
+RMAVTRQFPWAVQTWYDSLGAINKIQDFLQKQEYKTLEYNLTTTEVVMENVTAFWEEGFGELFEKAKQN
+NNNRKTSNGDDSLFFSNFSLLGTPVLKDINFKIERGQLLAVAGSTGAGKTSLLMMIMGELEPSEGKIKH
+SGRISFCSQFSWIMPGTIKENIIFGVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQR
+ARISLARAVYKDADLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILNEG
+SSYFYGTFSELQNLQPDFSSKLMGCDSFDQFSAERRNSILTETLHRFSLEGDAPVSWTETKKQSFKQTG
+EFGEKRKNSILNPINSIRKFSIVQKTPLQMNGIEEDSDEPLERRLSLVPDSEQGEAILPRISVISTGPT
+LQARRRQSVLNLMTHSVNQGQNIHRKTTASTRKVSLAPQANLTELDIYSRRLSQETGLEISEEINEEDL
+KECLFDDMESIPAVTTWNTYLRYITVHKSLIFVLIWCLVIFLAEVAASLVVLWLLGNTPLQDKGNSTHS
+RNNSYAVIITSTSSYYVFYIYVGVADTLLAMGFFRGLPLVHTLITVSKILHHKMLHSVLQAPMSTLNTL
+KAGGILNRFSKDIAILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFVATVPVIVAFIMLRAYFLQT
+SQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPYFETLFHKALNLHTANWFLYLSTLRWFQMRIEM
+IFVIFFIAVTFISILTTGEGEGRVGIILTLAMNIMSTLQWAVNSSIDVDSLMRSVSRVFKFIDMPTEGK
+PTKSTKPYKNGQLSKVMIIENSHVKKDDIWPSGGQMTVKDLTAKYTEGGNAILENISFSISPGQRVGLL
+GRTGSGKSTLLSAFLRLLNTEGEIQIDGVSWDSITLQQWRKAFGVIPQKVFIFSGTFRKNLDPYEQWSD
+QEIWKVADEVGLRSVIEQFPGKLDFVLVDGGCVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDPVTY
+QIIRRTLKQAFADCTVILCEHRIEAMLECQQFLVIEENKVRQYDSIQKLLNERSLFRQAISPSDRVKLF
+PHRNSSKCKSKPQIAALKEETEEEVQDTRL.
However, tachyon's, MeowChow's and tadman's orignal codes all gave this:
QRPEKASKSTRPRKGRQREDQPADEKEREREAKKPKARRRGGETKAQPGRADPNKEERAGGRTHPAGHGQ
+RAKKTKSRKGQNNNKEGAAAPQAGEQAAGGAQAGGRKRQRAGKERTEEQKAEEAEKENRQTEKTRKAAR
+SAGPAKGRKTTRATRQPAQTDGANKQQKQEKTENTTTEETAEEGGEEKAKQNNRKTGDSGTPKKERGQA
+AGTGAGKTGEEPEGKKHGRQPGTKEGERRSKAQEEDKAEKDGEGGTGGQRARARAKADPGTEKEESKAN
+KTRTKEKKADKEGSSGTEQQPDSKGDQAERRTETHREGAPTETKKQKQTGEGEKRKPNRKQKTPQGEEE
+PERRPEQGEAPRSSTGPTQARRRQNTHNQGQNHRKTTATRKAPQANTERRQETGEEENEEDKEESPATT
+NTRTHKSAEAAGNTPQDKGTRNSATSTGADTAGRGPTTKHHKQAPTNTKAGGRKADPTDQGAAAQPATP
+ARAQTQQKQEEGRPTTSKGTRAGRQPETHKATANTRQREATTTGEGEGRGTATQANSSRSRKDPTEGKP
+TKTKPKGQKEHKKDPGGQTKTAKTEGGAENPGQRGGRTGGKTARNTEGEQGTQQRKAGPQKGTRKNPEQ
+QEKAEGREQPGKDGGSGHKQARKAKEPAPTQRRTKQAATEHREAEQQEENKRQQKNERSRQASPDRKPH
+RNSKKKPQAAKEETEEEQTR
It is not at all clear to me why, and it is not at all related
to CFTR. For that matter, it's not related to any protein
in public databases. Congradulations, you did gene discovery;
pharamceutical companies spent billions of dollars to do that :-)
Also, japhy's code returns nothing (except some line feeds apparently).
So, can anyone point out the problems with these subs? I
copied them directly from the html, and only removed "+" at
the beginning of code wrapped lines, and changed the name of
the subs. Here is my code:
#!/usr/bin/perl
while (<DATA>) {
$cftr=$_;
}
print "tadman original\n".f0($cftr)."\n\n";
print "japhy\n".f1($cftr)."\n\n";
print "MeowChow\n".f2($cftr)."\n\n";
print "no_slogan\n".f3($cftr)."\n\n";
print "srawls\n".f4($cftr)."\n\n";
print "tachyon\n".RNA($cftr)."\n\n";
print "tadman golf\n".f5($cftr)."\n\n";
sub f0 { # orginal by tadman
my %g = (
# . - Stop
'UAA'=>'.','UAG'=>'.','UGA'=>'.',
# A - Alanine
'GCU'=>'A','GCC'=>'A','GCA'=>'A','GCG'=>'A',
# C - Cysteine
'UGU'=>'C','UGC'=>'C',
# D - Aspartic Acid
'GAU'=>'D','GAC'=>'D',
# E - Glutamic Acid
'GAA'=>'E','GAG'=>'E',
# F - Phenylalanine
'UUU'=>'F','UUC'=>'F',
# G - Glycine
'GGU'=>'G','GGC'=>'G','GGA'=>'G','GGG'=>'G',
# H - Histidine
'CAU'=>'H','CAC'=>'H',
# I - Isoleucine
'AUU'=>'I','AUC'=>'I','AUA'=>'I',
# K - Lysine
'AAA'=>'K','AAG'=>'K',
# L - Leucine
'CUU'=>'L','CUC'=>'L','CUA'=>'L','CUG'=>'L',
'UUA'=>'L','UUG'=>'L',
# M - Methionine
'AUG'=>'M',
# N - Asparagine
'AAU'=>'N','AAC'=>'N',
# P - Proline
'CCU'=>'P','CCC'=>'P','CCA'=>'P','CCG'=>'P',
# Q - Glutamine
'CAA'=>'Q','CAG'=>'Q',
# R - Arginine
'CGU'=>'R','CGC'=>'R','CGA'=>'R','CGG'=>'R',
'AGA'=>'R','AGG'=>'R',
# S - Serine
'UCU'=>'S','UCC'=>'S','UCA'=>'S','UCG'=>'S',
'AGU'=>'S','AGC'=>'S',
# T - Threonine
'ACU'=>'T','ACC'=>'T','ACA'=>'T','ACG'=>'T',
# V - Valine
'GUU'=>'V','GUC'=>'V','GUA'=>'V','GUG'=>'V',
# W - Tryptophan
'UGG'=>'W',
# Y - Tyrosine
'UAU'=>'Y','UAC'=>'Y',
);
$_=pop;s/.{1,3}/$g{$&}/g;$_
}
sub #japhy
B(){''}sub
Z(){(B)x13}sub
U(){(B)x31}sub
O(){(B)x83}sub
J(){(B)x343}sub
b(){B,B,B}@g{AAA..UUU}=(K,B,N,b,K,Z,N,U,T,B,T,b,T,Z,T,O,R,B,S,
+b,R,Z,S,J,
I,B,I,b,M,Z,I,(B)x811,Q,B,H,b,Q,Z,H,U,P,B,P,b,P,Z,P,O,R,B,R,b,
+R,Z,R,J,L,
B,L,b,L,Z,L,(B)x2163,E,B,D,b,E,Z,D,U,A,B,A,b,A,Z,A,O,G,B,G,b,G
+,Z,G,J,V,B
,V,b,V,Z,V,(B)x8923,'.',B,Y,b,'.',Z,Y,U,S,B,S,b,S,Z,S,O,'.',B,
+C,b,W,Z,C,
J,L,B,F,b,L,Z,F);sub
f1{$_=pop;s/..?.?/$g{$&}/g;$_}
sub f2{ #MeowChow
my@r=qw(UA[AG]|UGA GC. - UG[UC] GA[UC] GA[AG] UU[UC] GG. CA[UC] AU[^G]
+ - AA[AG] CU.|UU[AG] AUG AA[UC] - CC. CA[AG] CG.|AG[AG] UC.|AG[UC] AC
+. - GU. UGG - UA[UC] ^);
((my$t=pop)=~s|..?.?|chr 64+(grep$&=~/$r[$_]/,0..26)[0]|eg);$t=~y/@Z/.
+/d;$t
}
sub f3 { #no_slogan
$_="KNNKtIIIMRSSRQHHQplr.YY.sLFFL.CCWEDDEavg";s/[a-z]/uc$&x4/eg;@x=/./
+g;join"",@x[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g]
}
sub f4 { #srawls
$_="KNNKtIIIMRSSRQHHQplr.YY.sLFFL.CCWEDDEavg";s/[a-z]/uc$&x4/eg;
join"",(/./g)[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g]
}
sub RNA { #tachyon
@_{'UAAUAGUGAGCUGCCGCAGCGUGUUGCGAUGACGAAGAGUUUUUCGGUGGCGGAGGGCAUCACAUU
+AUCAUAAAAAAGCUUCUCCUACUGUUAUUGAUGAAUAACCCUCCCCCACCGCAACAGCGUCGCCGACGG
+AGAAGG
UCUUCCUCAUCGAGUAGCACUACCACAACGGUUGUCGUAGUGUGGUAUUAC'=~/(...)/g}=split/
+/,'...AAAACCDDEEFFGGGGHHIIIKKLLLLLLMNNPPPPQQRRRRRRSSSSSSTTTTVVVVWYY';
+$_=pop
;s/..?.?/$_{$&}/g;$_
}
sub f5{ #tadman
$_=pop;y/UCAG/0123/;s/(.)(.)(.)/substr
"FFLLSSSSYY..CC.WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"
,$1<<4|$2<<2|$3,1/ge;y/0123//d;$_
}
#>gi|6995995|ref|NM_000492.2| Homo sapiens cystic fibrosis transmembra
+ne conductance regulator, ATP-binding cassette (sub-family C, member
+7) (CF
TR), mRNA
__DATA__
AUGCAGAGGUCGCCUCUGGAAAAGGCCAGCGUUGUCUCCAAACUUUUUUUCAGCUGGACCAGACCAAUUU
+UGAGGAAAGGAUACAGACAGCGCCUGGAAUUGUCAGACAUAUACCAAAUCCCUUCUGUUGAUUCUGCUG
+ACAAUCUAUCUGAAAAAUUGGAAAGAGAAUGGGAUAGAGAGCUGGCUUCAAAGAAAAAUCCUAAACUCA
+UUAAUGCCCUUCGGCGAUGUUUUUUCUGGAGAUUUAUGUUCUAUGGAAUCUUUUUAUAUUUAGGGGAAG
+UCACCAAAGCAGUACAGCCUCUCUUACUGGGAAGAAUCAUAGCUUCCUAUGACCCGGAUAACAAGGAGG
+AACGCUCUAUCGCGAUUUAUCUAGGCAUAGGCUUAUGCCUUCUCUUUAUUGUGAGGACACUGCUCCUAC
+ACCCAGCCAUUUUUGGCCUUCAUCACAUUGGAAUGCAGAUGAGAAUAGCUAUGUUUAGUUUGAUUUAUA
+AGAAGACUUUAAAGCUGUCAAGCCGUGUUCUAGAUAAAAUAAGUAUUGGACAACUUGUUAGUCUCCUUU
+CCAACAACCUGAACAAAUUUGAUGAAGGACUUGCAUUGGCACAUUUCGUGUGGAUCGCUCCUUUGCAAG
+UGGCACUCCUCAUGGGGCUAAUCUGGGAGUUGUUACAGGCGUCUGCCUUCUGUGGACUUGGUUUCCUGA
+UAGUCCUUGCCCUUUUUCAGGCUGGGCUAGGGAGAAUGAUGAUGAAGUACAGAGAUCAGAGAGCUGGGA
+AGAUCAGUGAAAGACUUGUGAUUACCUCAGAAAUGAUUGAAAAUAUCCAAUCUGUUAAGGCAUACUGCU
+GGGAAGAAGCAAUGGAAAAAAUGAUUGAAAACUUAAGACAAACAGAACUGAAACUGACUCGGAAGGCAG
+CCUAUGUGAGAUACUUCAAUAGCUCAGCCUUCUUCUUCUCAGGGUUCUUUGUGGUGUUUUUAUCUGUGC
+UUCCCUAUGCACUAAUCAAAGGAAUCAUCCUCCGGAAAAUAUUCACCACCAUCUCAUUCUGCAUUGUUC
+UGCGCAUGGCGGUCACUCGGCAAUUUCCCUGGGCUGUACAAACAUGGUAUGACUCUCUUGGAGCAAUAA
+ACAAAAUACAGGAUUUCUUACAAAAGCAAGAAUAUAAGACAUUGGAAUAUAACUUAACGACUACAGAAG
+UAGUGAUGGAGAAUGUAACAGCCUUCUGGGAGGAGGGAUUUGGGGAAUUAUUUGAGAAAGCAAAACAAA
+ACAAUAACAAUAGAAAAACUUCUAAUGGUGAUGACAGCCUCUUCUUCAGUAAUUUCUCACUUCUUGGUA
+CUCCUGUCCUGAAAGAUAUUAAUUUCAAGAUAGAAAGAGGACAGUUGUUGGCGGUUGCUGGAUCCACUG
+GAGCAGGCAAGACUUCACUUCUAAUGAUGAUUAUGGGAGAACUGGAGCCUUCAGAGGGUAAAAUUAAGC
+ACAGUGGAAGAAUUUCAUUCUGUUCUCAGUUUUCCUGGAUUAUGCCUGGCACCAUUAAAGAAAAUAUCA
+UCUUUGGUGUUUCCUAUGAUGAAUAUAGAUACAGAAGCGUCAUCAAAGCAUGCCAACUAGAAGAGGACA
+UCUCCAAGUUUGCAGAGAAAGACAAUAUAGUUCUUGGAGAAGGUGGAAUCACACUGAGUGGAGGUCAAC
+GAGCAAGAAUUUCUUUAGCAAGAGCAGUAUACAAAGAUGCUGAUUUGUAUUUAUUAGACUCUCCUUUUG
+GAUACCUAGAUGUUUUAACAGAAAAAGAAAUAUUUGAAAGCUGUGUCUGUAAACUGAUGGCUAACAAAA
+CUAGGAUUUUGGUCACUUCUAAAAUGGAACAUUUAAAGAAAGCUGACAAAAUAUUAAUUUUGAAUGAAG
+GUAGCAGCUAUUUUUAUGGGACAUUUUCAGAACUCCAAAAUCUACAGCCAGACUUUAGCUCAAAACUCA
+UGGGAUGUGAUUCUUUCGACCAAUUUAGUGCAGAAAGAAGAAAUUCAAUCCUAACUGAGACCUUACACC
+GUUUCUCAUUAGAAGGAGAUGCUCCUGUCUCCUGGACAGAAACAAAAAAACAAUCUUUUAAACAGACUG
+GAGAGUUUGGGGAAAAAAGGAAGAAUUCUAUUCUCAAUCCAAUCAACUCUAUACGAAAAUUUUCCAUUG
+UGCAAAAGACUCCCUUACAAAUGAAUGGCAUCGAAGAGGAUUCUGAUGAGCCUUUAGAGAGAAGGCUGU
+CCUUAGUACCAGAUUCUGAGCAGGGAGAGGCGAUACUGCCUCGCAUCAGCGUGAUCAGCACUGGCCCCA
+CGCUUCAGGCACGAAGGAGGCAGUCUGUCCUGAACCUGAUGACACACUCAGUUAACCAAGGUCAGAACA
+UUCACCGAAAGACAACAGCAUCCACACGAAAAGUGUCACUGGCCCCUCAGGCAAACUUGACUGAACUGG
+AUAUAUAUUCAAGAAGGUUAUCUCAAGAAACUGGCUUGGAAAUAAGUGAAGAAAUUAACGAAGAAGACU
+UAAAGGAGUGCCUUUUUGAUGAUAUGGAGAGCAUACCAGCAGUGACUACAUGGAACACAUACCUUCGAU
+AUAUUACUGUCCACAAGAGCUUAAUUUUUGUGCUAAUUUGGUGCUUAGUAAUUUUUCUGGCAGAGGUGG
+CUGCUUCUUUGGUUGUGCUGUGGCUCCUUGGAAACACUCCUCUUCAAGACAAAGGGAAUAGUACUCAUA
+GUAGAAAUAACAGCUAUGCAGUGAUUAUCACCAGCACCAGUUCGUAUUAUGUGUUUUACAUUUACGUGG
+GAGUAGCCGACACUUUGCUUGCUAUGGGAUUCUUCAGAGGUCUACCACUGGUGCAUACUCUAAUCACAG
+UGUCGAAAAUUUUACACCACAAAAUGUUACAUUCUGUUCUUCAAGCACCUAUGUCAACCCUCAACACGU
+UGAAAGCAGGUGGGAUUCUUAAUAGAUUCUCCAAAGAUAUAGCAAUUUUGGAUGACCUUCUGCCUCUUA
+CCAUAUUUGACUUCAUCCAGUUGUUAUUAAUUGUGAUUGGAGCUAUAGCAGUUGUCGCAGUUUUACAAC
+CCUACAUCUUUGUUGCAACAGUGCCAGUGAUAGUGGCUUUUAUUAUGUUGAGAGCAUAUUUCCUCCAAA
+CCUCACAGCAACUCAAACAACUGGAAUCUGAAGGCAGGAGUCCAAUUUUCACUCAUCUUGUUACAAGCU
+UAAAAGGACUAUGGACACUUCGUGCCUUCGGACGGCAGCCUUACUUUGAAACUCUGUUCCACAAAGCUC
+UGAAUUUACAUACUGCCAACUGGUUCUUGUACCUGUCAACACUGCGCUGGUUCCAAAUGAGAAUAGAAA
+UGAUUUUUGUCAUCUUCUUCAUUGCUGUUACCUUCAUUUCCAUUUUAACAACAGGAGAAGGAGAAGGAA
+GAGUUGGUAUUAUCCUGACUUUAGCCAUGAAUAUCAUGAGUACAUUGCAGUGGGCUGUAAACUCCAGCA
+UAGAUGUGGAUAGCUUGAUGCGAUCUGUGAGCCGAGUCUUUAAGUUCAUUGACAUGCCAACAGAAGGUA
+AACCUACCAAGUCAACCAAACCAUACAAGAAUGGCCAACUCUCGAAAGUUAUGAUUAUUGAGAAUUCAC
+ACGUGAAGAAAGAUGACAUCUGGCCCUCAGGGGGCCAAAUGACUGUCAAAGAUCUCACAGCAAAAUACA
+CAGAAGGUGGAAAUGCCAUAUUAGAGAACAUUUCCUUCUCAAUAAGUCCUGGCCAGAGGGUGGGCCUCU
+UGGGAAGAACUGGAUCAGGGAAGAGUACUUUGUUAUCAGCUUUUUUGAGACUACUGAACACUGAAGGAG
+AAAUCCAGAUCGAUGGUGUGUCUUGGGAUUCAAUAACUUUGCAACAGUGGAGGAAAGCCUUUGGAGUGA
+UACCACAGAAAGUAUUUAUUUUUUCUGGAACAUUUAGAAAAAACUUGGAUCCCUAUGAACAGUGGAGUG
+AUCAAGAAAUAUGGAAAGUUGCAGAUGAGGUUGGGCUCAGAUCUGUGAUAGAACAGUUUCCUGGGAAGC
+UUGACUUUGUCCUUGUGGAUGGGGGCUGUGUCCUAAGCCAUGGCCACAAGCAGUUGAUGUGCUUGGCUA
+GAUCUGUUCUCAGUAAGGCGAAGAUCUUGCUGCUUGAUGAACCCAGUGCUCAUUUGGAUCCAGUAACAU
+ACCAAAUAAUUAGAAGAACUCUAAAACAAGCAUUUGCUGAUUGCACAGUAAUUCUCUGUGAACACAGGA
+UAGAAGCAAUGCUGGAAUGCCAACAAUUUUUGGUCAUAGAAGAGAACAAAGUGCGGCAGUACGAUUCCA
+UCCAGAAACUGCUGAACGAGAGGAGCCUCUUCCGGCAAGCCAUCAGCCCCUCCGACAGGGUGAAGCUCU
+UUCCCCACCGGAACUCAAGCAAGUGCAAGUCUAAGCCCCAGAUUGCUGCUCUGAAAGAGGAGACAGAAG
+AAGAGGUGCAAGAUACAAGGCUUUAG
Happy debugging golfed code,
Scott
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.