comment on

Hello everyone, I am really, really new to perl, just started recently and so far I have always tried to do as much as possible by using bash and avoiding perl. However my professor and I have decided that for this problem perl is better of use and easier - now here I am. I also apologize for the huge code below, I somehow didn't get the readmore thing to work to shorten everything up.

I still run into several problems with the script below:
- Everything till I print the reads works (yeay)

- Everything after that...not really. My main issue is that I do not get the content I should printed into the Merge-File and I have absolutely no clue what and where something went wrong.

I already tried running the debugger without results though. And everything else I tried only made things worse. My suspicion is that the tRNAname variables are not correct and that the work with the multi-dimensional has is not fully correct, however I have no idea what and where. I am really concerned right now because I do not know anymore where I should start and work from here.

Any help is much appreciated ~Panda

#!/usr/bin/perl -w

use strict;
use warnings;
#Initiate all variables, hashes and co

my @folders;
my $folder;
my $file;
my $tocount;

my $reads;

my $trftable;
my $head;
my @line;
my $line;

my %hash;
my $tRNAname;
my @tRF_types;
my $tRF_type;



#Open folders in working directy

@folders=glob("*"); #to get all folders in directory; extension ("*") 
+as wildcard to get all names
foreach$folder(@folders) #to speak to each element in directory
    {
    next if ($folder!~/^UNITAS_/); #skip elements which do not start w
+ith "UNITAS"
    opendir(DIR,$folder)||die print$!; #open folder, end script when o
+pening is not possible (DIR is the "filehandle" for the directory)
    print"\n$folder";
    while($file=readdir(DIR)) #returns content of folder
        {
        next if($file!~/\.mapped_sequences$/); #get the mapped_sequenc
+es file we need to read out the reads
        print"\n$file"; #print out file names to make sure we get the 
+right files

        $reads = 0; #set the number of reads to 0 for each run

        open(FILE,"$folder/$file")||die print$!; #open file
        while($tocount=<FILE>)#read file
            {
            $tocount =~ s/>//g; #remove all ">"
            next if ($tocount =~ /[A-Za-z]/); #skip lines which contai
+n the sequence
                    
            if ($tocount =~ /[0-9]/) #get the read-number
                {
                print"\n$tocount";
                $reads = ($reads + $tocount); # add up all reads
                }
            print"\n$reads";

            }
        close FILE;

        $trftable = 'unitas.tRF-table.txt'; #save file name in variabl
+e
        open(TRF,"$folder/$trftable"); #open trf-table.txt

        $head=<TRF>; #remove the first four lines of the trf-table.txt
+ file
        $head=<TRF>;
        $head=<TRF>;
        $head=<TRF>;

        %hash = (); #initiate empty hash

        while($line=<TRF>)
            {
            @line=split("\t",$trftable);
        
            if($line[0]=~s/tRNA-[^-]+-...//) # "tRNA-"(matched tRNA un
+d -) "[^-]+" beginning bis Ende, egal was "-..."(weiterer Strich bis 
+Ende)
                {
                $tRNAname=$line[0];
                $tRNAname=$&; # "$&" = last pattern match
                print"\n$tRNAname";
                }
            else
                {
                $tRNAname=$line[0];
                $tRNAname=~s/-ENS.+$//; # "-ENS.+$" ( matched allen di
+e -ENS. bis Ende enthalten)
                print"\n$tRNAname";
                }
        
            $hash{$tRNAname}{"5p-tR-halves"}+=$line[1]/$reads*1000000;
            $hash{$tRNAname}{"5p-tRFs"}+=$line[3]/$reads*1000000;
            $hash{$tRNAname}{"3p-tR-halves"}+=$line[5]/$reads*1000000;
            $hash{$tRNAname}{"3p-CCA-tRFs"}+=$line[7]/$reads*1000000;
            $hash{$tRNAname}{"3p-tRFs"}+=$line[9]/$reads*1000000;
            $hash{$tRNAname}{"tRF-1"}+=$line[11]/$reads*1000000;
            $hash{$tRNAname}{"tRNA-leader"}+=$line[13]/$reads*1000000;
            $hash{$tRNAname}{"misc-tRFs"}+=$line[15]/$reads*1000000;
            }


        open(MERGE,">merge"); #open new file to save the new sortet st
+uff in
        
        @tRF_types=("5p-tR-halves","5p-tRFs","3p-tR-halves","3p-CCA-tR
+Fs","3p-tRFs","tRF-1","tRNA-leader","misc-tRFs");
        foreach$tRNAname(sort{$a cmp $b}keys%hash) #sortiert die alpha
+betisch nach keys
            {
            print MERGE $tRNAname; # print tRNA name
            foreach$tRF_type(@tRF_types) 
                {
                print MERGE"\t$hash{$tRNAname}{$tRF_type}"; # print co
+unts for each tRF type separated by tab
                }
            print MERGE"\n";# print newline
            }

        close TRF;
        close MERGE;
        close DIR;
        }
    }
[download]

In reply to Skript help needed - RegEx & Hashes by PandaRaey

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.