This is my very first time using Perl and I'm wondering if someone could show me how to perform the following task (which will form part of a larger pipeline I'm building). I'm hoping that seeing how to do this in Perl will give me a foothold/starting point into the language and allow me to build the pipeline from there. The task in question:
I have a text file containing approximately 15,000 records, each record looks like this:
TF Unknown TF Name Unknown Gene ENSG00000113916 Motif ENSG00000113916___1|2x3 Family C2H2 ZF Species Homo_sapiens Pos A C G T 1 0.538498 0.157305 0.157633 0.146564 2 0.072844 0.008771 0.877166 0.0412175 3 0.959269 0.013107 0.015961 0.0116621 4 0.852439 0.023883 0.016813 0.106864 5 0.57332 0.068801 0.181385 0.176494 6 0.139513 0.074798 0.737607 0.0480813 7 0.735484 0.091299 0.09091 0.0823067 8 0.79932 0.027041 0.137306 0.0363319 9 0.16103 0.12536 0.109938 0.603672 10 0.622356 0.06782 0.115463 0.194361
For the rows explicitly numbered 1 to 10, I need to find the highest value in each row of four (<1.0) values and output the character heading that column (a DNA base). i.e. row 1 in the above matrix is A. I ultimately need to produce a list containing two columns; the first with the “motif name" from row four of the record, and second with the string of 10 characters from the matrix analysis. e.g.
ENSG00000113916___1|2x3 AGAAAGAATA
Thank you, any help is sincerely appreciated!
In reply to First foray into Perl by LostWeekender
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |