Re^2: Skript help needed - RegEx & Hashes

Here's how I might have written that script, with the following changes to your version:

Formatting: I've used my personal formatting style; a matter of taste of course (and sometimes I even vary my own style, if I think it looks better another way). For example, I added a bunch of whitespace and removed a couple of parens where it's not strictly necessary (but you are free to add parens if you like). I wrapped a few long lines so they would display nicely here, but usually I write my open ... or die ... on one line if it fits reasonably.
Don't need both #!/usr/bin/perl -w and use warnings; (What's wrong with -w and $^W)
I used the glob suggestion from jwkrahn, and also made sure that it would only return directories with the -d filetest operator. Note that glob has quite a few caveats, but with fixed strings it's ok.
I applied most of the changes suggested by poj and others.
Just to shorten the code a bit, I used an intermediate hash reference $h for $hash{$tRNAname}, in order for this to work I had to make sure to initialize $hash{$tRNAname} with an empty anonymous hash: my $h = ( $hash{$tRNAname} //= {} ) means "assign {} (an empty anonymous hash) to $hash{$tRNAname} if the latter is not yet defined, then assign the value of $hash{$tRNAname} to $h". (See also perlreftut and perlref.) Update: And see hippo's reply for one way to shorten it even more.
You were closing the directory handle too early, and I had to change the scoping of a couple of variables like $tRNAname.
I switched to using Data::Dumper to output the variables, which I configured in a way that I like the output better (although normally I'd use Data::Dump; Date::Dumper is a core module). BTW, I'm not sure why you were prefixing the \n in your prints, but normally one would do things like print "$tocount\n";
You said open my $merge,">>","$folder/$merge", but the latter variable doesn't yet exist at that point (my $merge doesn't take effect until after the open statement), and in your original script you said open(MERGE,">merge"), so I'm not sure if you want a merge file per folder, or a single merge file in the current working directory? If it's the former, the probably hippo's suggestion of opening the file once at the top of the script is better, also then you don't have to use append mode.
I'm not sure about if ( $tocount =~ /[0-9]/ ): If you want to make sure that it contains only digits, you should anchor your regex, as in ~~/^[0-9]$/~~ /^[0-9]+$/.
Plus I made a few other tweaks and used idioms in a few places, such as ( $tRNAname = $line[0] ) =~ s/-ENS.+$//, which means "copy $line[0] to $tRNAname and then apply the regex to $tRNAname".
{$a cmp $b} is the default sort order and isn't really needed, unless you really want to be explicit (it doesn't hurt).

Please have a look, and if you have any questions, please let us know.

#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Useqq = 1;
$Data::Dumper::Quotekeys = 0;
$Data::Dumper::Sortkeys = 1;

for my $folder ( grep {-d} glob('UNITAS_*') ) {
    print Data::Dumper->Dump([$folder], [qw/folder/]);
    opendir my $dh, $folder or die "$folder: $!";
    while ( my $file = readdir($dh) ) {
        next if $file !~ /\.mapped_sequences$/;
        print Data::Dumper->Dump([$file], [qw/file/]);
        
        my $reads = 0;
        open my $fileone, '<', "$folder/$file"
            or die "$folder/$file: $!";
        while ( my $tocount = <$fileone> ) {
            chomp $tocount;
            $tocount =~ s/>//g;
            next if $tocount =~ /[A-Za-z]/;
            if ( $tocount =~ /[0-9]/ ) {
                print Data::Dumper->Dump([$tocount], [qw/tocount/]);
                $reads += $tocount;
            }
        }
        close $fileone;
        print Data::Dumper->Dump([$reads], [qw/reads/]);
        
        my %hash;
        my $trftable = 'unitas.tRF-table.txt';
        open my $trf, '<', "$folder/$trftable"
            or die "$folder/$trftable: $!";
        <$trf> for 1 .. 4;
        while ( my $line = <$trf> ) {
            chomp $line;
            my @line = split /\t/, $line;
            #print Data::Dumper->Dump([\@line], [qw/*line/]);
            my $tRNAname;
            if ( $line[0] =~ s/tRNA-[^-]+-...// )
                { $tRNAname = $& }
            else
                { ( $tRNAname = $line[0] ) =~ s/-ENS.+$// }
            print Data::Dumper->Dump([$tRNAname], [qw/tRNAname/]);
            my $h = ( $hash{$tRNAname} //= {} );
            $h->{"5p-tR-halves"} += $line[ 1] / $reads * 1000000;
            $h->{"5p-tRFs"}      += $line[ 3] / $reads * 1000000;
            $h->{"3p-tR-halves"} += $line[ 5] / $reads * 1000000;
            $h->{"3p-CCA-tRFs"}  += $line[ 7] / $reads * 1000000;
            $h->{"3p-tRFs"}      += $line[ 9] / $reads * 1000000;
            $h->{"tRF-1"}        += $line[11] / $reads * 1000000;
            $h->{"tRNA-leader"}  += $line[13] / $reads * 1000000;
            $h->{"misc-tRFs"}    += $line[15] / $reads * 1000000;
        }
        close $trf;
        print Data::Dumper->Dump([\%hash], [qw/*hash/]);
        
        open my $merge, '>>', "$folder/merge"
            or die "$folder/merge: $!";
        my @tRF_types = ("5p-tR-halves", "5p-tRFs", "3p-tR-halves",
            "3p-CCA-tRFs", "3p-tRFs", "tRF-1", "tRNA-leader",
            "misc-tRFs");
        for my $tRNAname ( sort keys %hash ) {
            print $merge $tRNAname;
            for my $tRF_type (@tRF_types) {
                print $merge "\t$hash{$tRNAname}{$tRF_type}";
            }
            print $merge "\n";
        }
        close $merge;
    }
    close $dh;
}
[download]

For the sample data from this post, the output file merge I get is the following. Note that if you re-run the script, because of the append mode on the merge file, the same lines get added to that file again.

MT-TM    6500000    4500000    0    0    0    0    0    20416666.66666
+66
MT-TN    0    500000    0    0    0    750000    1000000    12625000
MT-TP    0    0    0    0    0    1000000    0    0
tRNA-Ala-AGC    0    3863095.23809524    0    0    136363.636363636   
+ 0    0    23353306.8783069
tRNA-Ala-CGC    0    8708333.33333333    0    0    0    14500000    50
+0000    8980291.00529101
tRNA-Ala-TGC    0    11833333.3333333    0    0    90909.0909090909   
+ 0    0    84521296.2962963
tRNA-Arg-ACG    0    100000    0    71428.5714285715    0    6500000  
+  0    4916666.66666667
[download]

Update: Minor edits and a few additions to the explanations.

Comment on Re^2: Skript help needed - RegEx & Hashes Select or Download Code

Replies are listed 'Best First'.
Re^3: Script help needed - RegEx & Hashes (DRY) by hippo (Archbishop) on Oct 12, 2018 at 12:27 UTC
`$h->{"5p-tR-halves"} += $line[ 1] / $reads * 1000000; $h->{"5p-tRFs"} += $line[ 3] / $reads * 1000000; $h->{"3p-tR-halves"} += $line[ 5] / $reads * 1000000; $h->{"3p-CCA-tRFs"} += $line[ 7] / $reads * 1000000; $h->{"3p-tRFs"} += $line[ 9] / $reads * 1000000; $h->{"tRF-1"} += $line[11] / $reads * 1000000; $h->{"tRNA-leader"} += $line[13] / $reads * 1000000; $h->{"misc-tRFs"} += $line[15] / $reads * 1000000;` [download] Years ago that would not have bothered me but nowadays it makes me twitch. YMMV but for DRY: `my $i = 1; for (qw/5p-tR-halves 5p-tRFs 3p-tR-halves 3p-CCA-tRFs 3p-tRFs tRF-1 tR +NA-leader misc-tRFs/) { $->{$_} += $line[$i] / $reads * 1000000; $i += 2; # Odd entries only }` [download]	[reply] [d/l] [select]
Re^4: Script help needed - RegEx & Hashes (DRY) by haukex (Archbishop) on Oct 12, 2018 at 12:30 UTC
I thought about doing something like that, but ended up leaving it out - I should've added it, since I agree with you!	[reply]