Here's how I might have written that script, with the following changes to your version:
- Formatting: I've used my personal formatting style; a matter of taste of course (and sometimes I even vary my own style, if I think it looks better another way). For example, I added a bunch of whitespace and removed a couple of parens where it's not strictly necessary (but you are free to add parens if you like). I wrapped a few long lines so they would display nicely here, but usually I write my open ... or die ... on one line if it fits reasonably.
- Don't need both #!/usr/bin/perl -w and use warnings; (What's wrong with -w and $^W)
- I used the glob suggestion from jwkrahn, and also made sure that it would only return directories with the -d filetest operator. Note that glob has quite a few caveats, but with fixed strings it's ok.
- I applied most of the changes suggested by poj and others.
- Just to shorten the code a bit, I used an intermediate hash reference $h for $hash{$tRNAname}, in order for this to work I had to make sure to initialize $hash{$tRNAname} with an empty anonymous hash: my $h = ( $hash{$tRNAname} //= {} ) means "assign {} (an empty anonymous hash) to $hash{$tRNAname} if the latter is not yet defined, then assign the value of $hash{$tRNAname} to $h". (See also perlreftut and perlref.) Update: And see hippo's reply for one way to shorten it even more.
- You were closing the directory handle too early, and I had to change the scoping of a couple of variables like $tRNAname.
- I switched to using Data::Dumper to output the variables, which I configured in a way that I like the output better (although normally I'd use Data::Dump; Date::Dumper is a core module). BTW, I'm not sure why you were prefixing the \n in your prints, but normally one would do things like print "$tocount\n";
- You said open my $merge,">>","$folder/$merge", but the latter variable doesn't yet exist at that point (my $merge doesn't take effect until after the open statement), and in your original script you said open(MERGE,">merge"), so I'm not sure if you want a merge file per folder, or a single merge file in the current working directory? If it's the former, the probably hippo's suggestion of opening the file once at the top of the script is better, also then you don't have to use append mode.
- I'm not sure about if ( $tocount =~ /[0-9]/ ): If you want to make sure that it contains only digits, you should anchor your regex, as in
/^[0-9]$/ /^[0-9]+$/.
- Plus I made a few other tweaks and used idioms in a few places, such as ( $tRNAname = $line[0] ) =~ s/-ENS.+$//, which means "copy $line[0] to $tRNAname and then apply the regex to $tRNAname".
- {$a cmp $b} is the default sort order and isn't really needed, unless you really want to be explicit (it doesn't hurt).
Please have a look, and if you have any questions, please let us know.
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Useqq = 1;
$Data::Dumper::Quotekeys = 0;
$Data::Dumper::Sortkeys = 1;
for my $folder ( grep {-d} glob('UNITAS_*') ) {
print Data::Dumper->Dump([$folder], [qw/folder/]);
opendir my $dh, $folder or die "$folder: $!";
while ( my $file = readdir($dh) ) {
next if $file !~ /\.mapped_sequences$/;
print Data::Dumper->Dump([$file], [qw/file/]);
my $reads = 0;
open my $fileone, '<', "$folder/$file"
or die "$folder/$file: $!";
while ( my $tocount = <$fileone> ) {
chomp $tocount;
$tocount =~ s/>//g;
next if $tocount =~ /[A-Za-z]/;
if ( $tocount =~ /[0-9]/ ) {
print Data::Dumper->Dump([$tocount], [qw/tocount/]);
$reads += $tocount;
}
}
close $fileone;
print Data::Dumper->Dump([$reads], [qw/reads/]);
my %hash;
my $trftable = 'unitas.tRF-table.txt';
open my $trf, '<', "$folder/$trftable"
or die "$folder/$trftable: $!";
<$trf> for 1 .. 4;
while ( my $line = <$trf> ) {
chomp $line;
my @line = split /\t/, $line;
#print Data::Dumper->Dump([\@line], [qw/*line/]);
my $tRNAname;
if ( $line[0] =~ s/tRNA-[^-]+-...// )
{ $tRNAname = $& }
else
{ ( $tRNAname = $line[0] ) =~ s/-ENS.+$// }
print Data::Dumper->Dump([$tRNAname], [qw/tRNAname/]);
my $h = ( $hash{$tRNAname} //= {} );
$h->{"5p-tR-halves"} += $line[ 1] / $reads * 1000000;
$h->{"5p-tRFs"} += $line[ 3] / $reads * 1000000;
$h->{"3p-tR-halves"} += $line[ 5] / $reads * 1000000;
$h->{"3p-CCA-tRFs"} += $line[ 7] / $reads * 1000000;
$h->{"3p-tRFs"} += $line[ 9] / $reads * 1000000;
$h->{"tRF-1"} += $line[11] / $reads * 1000000;
$h->{"tRNA-leader"} += $line[13] / $reads * 1000000;
$h->{"misc-tRFs"} += $line[15] / $reads * 1000000;
}
close $trf;
print Data::Dumper->Dump([\%hash], [qw/*hash/]);
open my $merge, '>>', "$folder/merge"
or die "$folder/merge: $!";
my @tRF_types = ("5p-tR-halves", "5p-tRFs", "3p-tR-halves",
"3p-CCA-tRFs", "3p-tRFs", "tRF-1", "tRNA-leader",
"misc-tRFs");
for my $tRNAname ( sort keys %hash ) {
print $merge $tRNAname;
for my $tRF_type (@tRF_types) {
print $merge "\t$hash{$tRNAname}{$tRF_type}";
}
print $merge "\n";
}
close $merge;
}
close $dh;
}
For the sample data from this post, the output file merge I get is the following. Note that if you re-run the script, because of the append mode on the merge file, the same lines get added to that file again.
MT-TM 6500000 4500000 0 0 0 0 0 20416666.66666
+66
MT-TN 0 500000 0 0 0 750000 1000000 12625000
MT-TP 0 0 0 0 0 1000000 0 0
tRNA-Ala-AGC 0 3863095.23809524 0 0 136363.636363636
+ 0 0 23353306.8783069
tRNA-Ala-CGC 0 8708333.33333333 0 0 0 14500000 50
+0000 8980291.00529101
tRNA-Ala-TGC 0 11833333.3333333 0 0 90909.0909090909
+ 0 0 84521296.2962963
tRNA-Arg-ACG 0 100000 0 71428.5714285715 0 6500000
+ 0 4916666.66666667
Update: Minor edits and a few additions to the explanations.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.