comment on

Thank you all already very much for the super fast replies. I will work myself through the tips and tricks and post an update as soon as I can.

Because it was asked for here a more detailed description of my problem: I have several folders which all contain two specific files. One that ends on ".mapped_sequences" and one that always has the same name "unitas.tRF-table.txt".

The mapped_sequences file looks like this with always a number and a gene sequence:

 >1
CCTCCTCTACCTCATCCCAGTT
>1
GGGTTCGATTCCCGGTCAGGGAT
[download]

The other file looks like this (without the four header lines and just a few example lines as the whole file is a bit big):

 source_tRNA    5p-tR-halves (fractionated)    5p-tR-halves (absolute)
+    5p-tRFs (fractionated)    5p-tRFs (absolute)    3p-tR-halves (fra
+ctionated)    3p-tR-halves (absolute)    3p-CCA-tRFs (fractionated)  
+  3p-CCA-tRFs (absolute)    3p-tRFs (fractionated)    3p-tRFs (absolu
+te)    tRF-1 (fractionated)    tRF-1 (absolute)    tRNA-leader (fract
+ionated)    tRNA-leader (absolute)    misc-tRFs (fractionated)    mis
+c-tRFs (absolute)
MT-TL2    0    0    0    0    0    0    0    0    0    0    0    0    
+6.16666666666667    18    0    0
MT-TL2-ENSG00000210191.1    1    1    4    4    0    0    0    0    0 
+   0    0    0    0    0    124    124
MT-TM    0    0    0    0    0    0    0    0    0    0    6    6    0
+    0    0    0
MT-TM-ENSG00000210112.1    13    13    9    9    0    0    0    0    0
+    0    0    0    0    0    40.8333333333333    43
MT-TN    0    0    0    0    0    0    0    0    0    0    1.5    3   
+ 2    2    0    0
MT-TN-ENSG00000210135.1    0    0    1    1    0    0    0    0    0  
+  0    0    0    0    0    25.25    26
MT-TP    0    0    0    0    0    0    0    0    0    0    2    2    0
+    0    0    0
tRNA-Ala-AGC-1-1    0    0    0.142857142857143    1    0    0    0   
+ 0    0    0    0    0    0    0    1.21693121693122    10
tRNA-Ala-AGC-11-1    0    0    0    0    0    0    0    0    0    0   
+ 0    0    0    0    9.99444444444444    39
tRNA-Ala-AGC-15-1    0    0    0    0    0    0    0    0    0    0   
+ 0    0    0    0    4.26111111111111    21
tRNA-Ala-AGC-2-1    0    0    0.166666666666667    2    0    0    0   
+ 0    0.0909090909090909    1    0    0    0    0    1.53835978835979
+    12
tRNA-Ala-AGC-2-2    0    0    0.166666666666667    2    0    0    0   
+ 0    0.0909090909090909    1    0    0    0    0    1.53835978835979
+    12
tRNA-Ala-AGC-3-1    0    0    0.166666666666667    2    0    0    0   
+ 0    0.0909090909090909    1    0    0    0    0    1.21693121693122
+    10
tRNA-Ala-AGC-4-1    0    0    5.75    46    0    0    0    0    0    0
+    0    0    0    0    1.17407407407407    13
tRNA-Ala-AGC-5-1    0    0    0.166666666666667    2    0    0    0   
+ 0    0    0    0    0    0    0    1.21693121693122    10
tRNA-Ala-AGC-6-1    0    0    0    0    0    0    0    0    0    0    
+0    0    0    0    2    2
tRNA-Ala-AGC-7-1    0    0    0.166666666666667    2    0    0    0   
+ 0    0    0    0    0    0    0    1.53835978835979    12
tRNA-Ala-AGC-8-1    0    0    0.5    1    0    0    0    0    0    0  
+  0    0    0    0    9.99444444444444    39
tRNA-Ala-AGC-8-2    0    0    0.5    1    0    0    0    0    0    0  
+  0    0    0    0    9.99444444444444    39
tRNA-Ala-AGC-9-1    0    0    0    0    0    0    0    0    0    0    
+0    0    0    0    0.511111111111111    3
tRNA-Ala-AGC-9-2    0    0    0    0    0    0    0    0    0    0    
+0    0    0    0    0.511111111111111    3
tRNA-Ala-CGC-1-1    0    0    5.75    46    0    0    0    0    0    0
+    0    0    0    0    5.84074074074074    21
tRNA-Ala-CGC-2-1    0    0    5.75    46    0    0    0    0    0    0
+    19    19    1    1    4.75740740740741    21
tRNA-Ala-CGC-3-1    0    0    5.75    46    0    0    0    0    0    0
+    10    10    0    0    6.07407407407407    8
tRNA-Ala-CGC-4-1    0    0    0.166666666666667    2    0    0    0   
+ 0    0    0    0    0    0    0    1.28835978835979    11
tRNA-Ala-TGC-1-1    0    0    0.166666666666667    2    0    0    0   
+ 0    0.0909090909090909    1    0    0    0    0    5.12645502645503
+    24
tRNA-Ala-TGC-2-1    0    0    5.75    46    0    0    0    0    0    0
+    0    0    0    0    5.12645502645503    24
tRNA-Ala-TGC-3-1    0    0    5.75    46    0    0    0    0    0    0
+    0    0    0    0    29.2931216931217    74
tRNA-Ala-TGC-3-2    0    0    5.75    46    0    0    0    0    0    0
+    0    0    0    0    29.2931216931217    74
tRNA-Ala-TGC-4-1    0    0    5.75    46    0    0    0    0    0    0
+    0    0    0    0    95.7097883597884    113
tRNA-Ala-TGC-5-1    0    0    0.166666666666667    2    0    0    0   
+ 0    0    0    0    0    0    0    2.20978835978836    17
tRNA-Ala-TGC-6-1    0    0    0.166666666666667    2    0    0    0   
+ 0    0.0909090909090909    1    0    0    0    0    0.07407407407407
+41    2
tRNA-Ala-TGC-7-1    0    0    0.166666666666667    2    0    0    0   
+ 0    0    0    0    0    0    0    2.20978835978836    17
tRNA-Arg-ACG-1-1    0    0    0.2    2    0    0    0.142857142857143 
+   1    0    0    13    13    0    0    9.83333333333333    95
[download]

So the first task was to count all of the numbers form the first file together (the reads) which is the one thing I got to work and it's doing it very well for all the files. The next task would to re-calculate the numbers in the 2nd file (number/reads*1000000) and afterwards sum together the numbers. As you can see from the 2nd code example there are multiple lines for the same amino-acid combination and all for one combination should be summed up together and saved in a new more organized file (the merged file and only the columns with the fractioned parts). I hope I could somehow explain what this script should do.

Regarding the indentation style - what would be a common one? I have to admit I only know this one. I got a book from my professor to find my way into perl and that was the one they used there so I kinda stuck to that.

Once again, thank you all ready for the super quick replies. I am very glad I found so much help so quickly ~Panda

In reply to Re: Skript help needed - RegEx & Hashes by PandaRaey
in thread Skript help needed - RegEx & Hashes by PandaRaey

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.