in reply to Skript help needed - RegEx & Hashes

Thank you all already very much for the super fast replies. I will work myself through the tips and tricks and post an update as soon as I can.

Because it was asked for here a more detailed description of my problem: I have several folders which all contain two specific files. One that ends on ".mapped_sequences" and one that always has the same name "unitas.tRF-table.txt".

The mapped_sequences file looks like this with always a number and a gene sequence:

>1 CCTCCTCTACCTCATCCCAGTT >1 GGGTTCGATTCCCGGTCAGGGAT

The other file looks like this (without the four header lines and just a few example lines as the whole file is a bit big):

source_tRNA 5p-tR-halves (fractionated) 5p-tR-halves (absolute) + 5p-tRFs (fractionated) 5p-tRFs (absolute) 3p-tR-halves (fra +ctionated) 3p-tR-halves (absolute) 3p-CCA-tRFs (fractionated) + 3p-CCA-tRFs (absolute) 3p-tRFs (fractionated) 3p-tRFs (absolu +te) tRF-1 (fractionated) tRF-1 (absolute) tRNA-leader (fract +ionated) tRNA-leader (absolute) misc-tRFs (fractionated) mis +c-tRFs (absolute) MT-TL2 0 0 0 0 0 0 0 0 0 0 0 0 +6.16666666666667 18 0 0 MT-TL2-ENSG00000210191.1 1 1 4 4 0 0 0 0 0 + 0 0 0 0 0 124 124 MT-TM 0 0 0 0 0 0 0 0 0 0 6 6 0 + 0 0 0 MT-TM-ENSG00000210112.1 13 13 9 9 0 0 0 0 0 + 0 0 0 0 0 40.8333333333333 43 MT-TN 0 0 0 0 0 0 0 0 0 0 1.5 3 + 2 2 0 0 MT-TN-ENSG00000210135.1 0 0 1 1 0 0 0 0 0 + 0 0 0 0 0 25.25 26 MT-TP 0 0 0 0 0 0 0 0 0 0 2 2 0 + 0 0 0 tRNA-Ala-AGC-1-1 0 0 0.142857142857143 1 0 0 0 + 0 0 0 0 0 0 0 1.21693121693122 10 tRNA-Ala-AGC-11-1 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 9.99444444444444 39 tRNA-Ala-AGC-15-1 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 4.26111111111111 21 tRNA-Ala-AGC-2-1 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 1.53835978835979 + 12 tRNA-Ala-AGC-2-2 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 1.53835978835979 + 12 tRNA-Ala-AGC-3-1 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 1.21693121693122 + 10 tRNA-Ala-AGC-4-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 1.17407407407407 13 tRNA-Ala-AGC-5-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 1.21693121693122 10 tRNA-Ala-AGC-6-1 0 0 0 0 0 0 0 0 0 0 +0 0 0 0 2 2 tRNA-Ala-AGC-7-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 1.53835978835979 12 tRNA-Ala-AGC-8-1 0 0 0.5 1 0 0 0 0 0 0 + 0 0 0 0 9.99444444444444 39 tRNA-Ala-AGC-8-2 0 0 0.5 1 0 0 0 0 0 0 + 0 0 0 0 9.99444444444444 39 tRNA-Ala-AGC-9-1 0 0 0 0 0 0 0 0 0 0 +0 0 0 0 0.511111111111111 3 tRNA-Ala-AGC-9-2 0 0 0 0 0 0 0 0 0 0 +0 0 0 0 0.511111111111111 3 tRNA-Ala-CGC-1-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 5.84074074074074 21 tRNA-Ala-CGC-2-1 0 0 5.75 46 0 0 0 0 0 0 + 19 19 1 1 4.75740740740741 21 tRNA-Ala-CGC-3-1 0 0 5.75 46 0 0 0 0 0 0 + 10 10 0 0 6.07407407407407 8 tRNA-Ala-CGC-4-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 1.28835978835979 11 tRNA-Ala-TGC-1-1 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 5.12645502645503 + 24 tRNA-Ala-TGC-2-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 5.12645502645503 24 tRNA-Ala-TGC-3-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 29.2931216931217 74 tRNA-Ala-TGC-3-2 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 29.2931216931217 74 tRNA-Ala-TGC-4-1 0 0 5.75 46 0 0 0 0 0 0 + 0 0 0 0 95.7097883597884 113 tRNA-Ala-TGC-5-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 2.20978835978836 17 tRNA-Ala-TGC-6-1 0 0 0.166666666666667 2 0 0 0 + 0 0.0909090909090909 1 0 0 0 0 0.07407407407407 +41 2 tRNA-Ala-TGC-7-1 0 0 0.166666666666667 2 0 0 0 + 0 0 0 0 0 0 0 2.20978835978836 17 tRNA-Arg-ACG-1-1 0 0 0.2 2 0 0 0.142857142857143 + 1 0 0 13 13 0 0 9.83333333333333 95

So the first task was to count all of the numbers form the first file together (the reads) which is the one thing I got to work and it's doing it very well for all the files. The next task would to re-calculate the numbers in the 2nd file (number/reads*1000000) and afterwards sum together the numbers. As you can see from the 2nd code example there are multiple lines for the same amino-acid combination and all for one combination should be summed up together and saved in a new more organized file (the merged file and only the columns with the fractioned parts). I hope I could somehow explain what this script should do.

Regarding the indentation style - what would be a common one? I have to admit I only know this one. I got a book from my professor to find my way into perl and that was the one they used there so I kinda stuck to that.

Once again, thank you all ready for the super quick replies. I am very glad I found so much help so quickly ~Panda