General Help/Advice Needed

PandaRaey has asked for the wisdom of the Perl Monks concerning the following question:

Hello everyone, I am super new to perl and after I was slowly starting to get the hang of it, I got sick. And since then I have been on a health break for around four months. You can imagine that I am now once again completely lost when it comes to scripting. I need to pick up my work again now, but I am honestly lost and a bit overwhelmed...if not even scared about how and where I should even start. I tried reading through some tutorials, but sadly those only worsened my anxiety with scripting.

All I am able to provide is a pseudo-code...or some sort of it, that I wrote down in order to see what my new script needs to do. However my first struggle is already fitting the regular expression in the first step. I do NOT expect anyone to write me down a script now, as I would love to be able to work on it myself. But I believe I do require some pushes into the right direction, some advice on how to tackle this script. Thank you very much in advance.

1. Create a header with all the IDs
- open/create file
- start header file with "Source tRNA"
- get ID via regex UNI_DATE_ID_ENDING_#NO
- add ID to file
- close file

2. Create new files
-Titles:
5p-tR-halves
5p-tRFs
p-tR-halves
3p-CCA-tRFs
3p-tRFs
tRF-1
tRNA-leader
misc-tRFs

- add first column to each file:
MT-TA
MT-TC
MT-TD
MT-TE
...

3.
- print the according columns from each mergerpt file from each folder
+ into the new file
- column assignment:
5p-tR-halves - 1
5p-tRFs - 2
p-tR-halves - 3
3p-CCA-tRFs - 4
3p-tRFs - 5
tRF-1 - 6
tRNA-leader - 7 
misc-tRFs - 8

4. Add the ID header to each file
[download]

I have multiple folders and they all contain a file with the same name ("merge") which I am working with. The file roughly looks like that:

MT-TA    5.36272153463324    21.4508861385329    8.04408230194985    3
+4.857689975116    0    0    0    13.4068038365831
MT-TC    24.1322469058496    160.881646038997    48.2644938116991    3
+7.5390507424327    45.5831330443825    0    0    104.573069925348
MT-TD    10.7254430692665    10.7254430692665    0    2.68136076731662
+    0    176.969810642897    1.34068038365831    445.105887374559
[download]

The goal is to have new files which all have the 1st/2nd...column from these merge files. So the first new file should contain just the 2nd column (because the first is the source name) from all "merge" files; the next just the 3rd column from all files, the next just the 4th...and so on.

Any help and advice on how I can start working on, or where I can turn to, to find what I need to write this script is much appreciated. ~Panda

Comment on General Help/Advice Needed Select or Download Code

Replies are listed 'Best First'.
Re: General Help/Advice Needed by 1nickt (Canon) on Feb 26, 2019 at 14:12 UTC
Hi, welcome back to Perl, the One True Religion. Glad you are feeling better. Start here: perlintro. Write several really simple programs. Much, much simpler than what you have sketched out. Acquire basic familiarity with Perl and its idioms and techniques. Then gradually expand your programs to approach what you want to do. Most importantly, write a test for everything your program does. See Test::More (and don't stray to `Test2::Suite`). Hope this helps! The way forward always starts with a minimal test.	[reply] [d/l]
Re: General Help/Advice Needed by NetWallah (Canon) on Feb 27, 2019 at 00:38 UTC
I'm not sure if I understood all your requirements, but here is some WORKING code to get you started. #!/usr/bin/perl use strict; use warnings; #https://perlmonks.org/?node_id=1230556 #Normal open & process Input file would be # open my $inp, "<", "Inp-file-name"; # while (<$inp>){ # .. process One record , now located in $_ # } # close $inp; my @New_file = ( # This tracks info for each output file {FILENAME=>"File-for-col1.txt", FH=>undef}, {FILENAME=>"File-for-col2.txt", FH=>undef}, ); for my $file(@New_file){ open $file->{FH}, ">", $file->{FILENAME} or die "ERROR: Cannot open $file->{FILENAME}: $!"; } while (<DATA>){ my ($header, @columns) = split ; print "DEBUG: hdr=$header .. got ", scalar (@columns), " columns\n +"; my $col_idx=0; for my $file(@New_file){ print {$file->{FH}} $columns[$col_idx],"\n"; $col_idx++; } } for my $file(@New_file){ close $file->{FH}; } __DATA__ MT-TA 5.36272153463324 21.4508861385329 8.04408230194985 3 +4.857689975116 0 0 0 13.4068038365831 MT-TC 24.1322469058496 160.881646038997 48.2644938116991 3 +7.5390507424327 45.5831330443825 0 0 104.573069925348 MT-TD 10.7254430692665 10.7254430692665 0 2.68136076731662 + 0 176.969810642897 1.34068038365831 445.105887374559 [download] As a computer, I find your faith in technology amusing.	[reply] [d/l]
Re: General Help/Advice Needed by karlgoethebier (Abbot) on Feb 26, 2019 at 19:28 UTC
Begin with open. Regards, Karl ŤThe Crux of the Biscuit is the Apostropheť `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l]