Thank you all already very much for the super fast replies. I will work myself through the tips and tricks and post an update as soon as I can.

Because it was asked for here a more detailed description of my problem: I have several folders which all contain two specific files. One that ends on ".mapped_sequences" and one that always has the same name "unitas.tRF-table.txt".

The mapped_sequences file looks like this with always a number and a gene sequence:

>1 CCTCCTCTACCTCATCCCAGTT >1 GGGTTCGATTCCCGGTCAGGGAT

The other file looks like this (without the four header lines and just a few example lines as the whole file is a bit big):

source_tRNA 5p-tR-halves (fractionated) 5p-tR-halves (absolute) + 5p-tRFs (fractionated) 5p-tRFs (absolute) 3p-tR-halves (fra +ctionated) 3p-tR-halves (absolute) 3p-CCA-tRFs (fractionated) + 3p-CCA-tRFs (absolute) 3p-tRFs (fractionated) 3p-tRFs (absolu +te) tRF-1 (fractionated) tRF-1 (absolute) tRNA-leader (fract +ionated) tRNA-leader (absolute) misc-tRFs (fractionated) mis +c-tRFs (absolute) MT-TL2 0 0 0 0 0 0 0 0 0 0 0 0 +6.16666666666667 18 0 0 MT-TL2-ENSG00000210191.1 1 1 4 4 0 0 0 0 0 + 0 0 0 0 0 124 124 MT-TM 0 0 0 0 0 0 0 0 0 0 6 6 0 + 0 0 0 MT-TM-ENSG00000210112.1 13 13 9 9 0 0 0 0 0 + 0 0 0 0 0 40.8333333333333 43 MT-TN 0 0 0 0 0 0 0 0 0 0 1.5 3 + 2 2 0 0 MT-TN-ENSG00000210135.1 0 0 1 1 0 0 0 0 0 + 0 0 0 0 0 25.25 26 MT-TP 0 0 0 0 0 0 0 0 0 0 2 2 0 + 0 0 0 tRNA-Ala-AGC-1-1 0 0 0.142857142857143 1 0 0 0 + 0 0 0 0 0 0 0 1.21693121693122 10 tRNA-Ala-AGC-11-1 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 9.99444444444444 39 tRNA-Ala-AGC-15-1 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 4.26111111111111 21 tRNA-Ala-AGC-2-1 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 1.53835978835979 + 12 tRNA-Ala-AGC-2-2 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 1.53835978835979 + 12 tRNA-Ala-AGC-3-1 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 1.21693121693122 + 10 tRNA-Ala-AGC-4-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 1.17407407407407 13 tRNA-Ala-AGC-5-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 1.21693121693122 10 tRNA-Ala-AGC-6-1 0 0 0 0 0 0 0 0 0 0 +0 0 0 0 2 2 tRNA-Ala-AGC-7-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 1.53835978835979 12 tRNA-Ala-AGC-8-1 0 0 0.5 1 0 0 0 0 0 0 + 0 0 0 0 9.99444444444444 39 tRNA-Ala-AGC-8-2 0 0 0.5 1 0 0 0 0 0 0 + 0 0 0 0 9.99444444444444 39 tRNA-Ala-AGC-9-1 0 0 0 0 0 0 0 0 0 0 +0 0 0 0 0.511111111111111 3 tRNA-Ala-AGC-9-2 0 0 0 0 0 0 0 0 0 0 +0 0 0 0 0.511111111111111 3 tRNA-Ala-CGC-1-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 5.84074074074074 21 tRNA-Ala-CGC-2-1 0 0 5.75 46 0 0 0 0 0 0 + 19 19 1 1 4.75740740740741 21 tRNA-Ala-CGC-3-1 0 0 5.75 46 0 0 0 0 0 0 + 10 10 0 0 6.07407407407407 8 tRNA-Ala-CGC-4-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 1.28835978835979 11 tRNA-Ala-TGC-1-1 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 5.12645502645503 + 24 tRNA-Ala-TGC-2-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 5.12645502645503 24 tRNA-Ala-TGC-3-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 29.2931216931217 74 tRNA-Ala-TGC-3-2 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 29.2931216931217 74 tRNA-Ala-TGC-4-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 95.7097883597884 113 tRNA-Ala-TGC-5-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 2.20978835978836 17 tRNA-Ala-TGC-6-1 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 0.07407407407407 +41 2 tRNA-Ala-TGC-7-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 2.20978835978836 17 tRNA-Arg-ACG-1-1 0 0 0.2 2 0 0 0.142857142857143 + 1 0 0 13 13 0 0 9.83333333333333 95

So the first task was to count all of the numbers form the first file together (the reads) which is the one thing I got to work and it's doing it very well for all the files. The next task would to re-calculate the numbers in the 2nd file (number/reads*1000000) and afterwards sum together the numbers. As you can see from the 2nd code example there are multiple lines for the same amino-acid combination and all for one combination should be summed up together and saved in a new more organized file (the merged file and only the columns with the fractioned parts). I hope I could somehow explain what this script should do.

Regarding the indentation style - what would be a common one? I have to admit I only know this one. I got a book from my professor to find my way into perl and that was the one they used there so I kinda stuck to that.

Once again, thank you all ready for the super quick replies. I am very glad I found so much help so quickly ~Panda


In reply to Re: Skript help needed - RegEx & Hashes by PandaRaey
in thread Skript help needed - RegEx & Hashes by PandaRaey

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.