Running a script across multiple directories with multiple output files (problems comparing hash key values)

msnyder424 has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that looks something like this, which I want to use it to search through the current directory I am in, open, all directories in that directory, open all files that match certain REs (fastq files that have a format such that every four lines go together), do some work with these files, and write some results to a file in each directory. (The actual code is much more complex but since I think I have a structural issue I am showing a simplified version)

#!user/local/perl
#Created by C. Pells, M. R. Snyder, and N. T. Marshall 2017

#Script trims and merges high throughput sequencing reads from fastq f
+iles for a specific primer set

use Cwd;
use warnings;

my $StartTime= localtime;

my $MasterDir = getcwd; #obtains the current directory


opendir (DIR, $MasterDir);
my @objects = readdir (DIR);
closedir (DIR);
foreach (@objects){
    print $_,"\n";
}

my @Dirs = ();
foreach my $O (0..$#objects){
    my $CurrDir = "";
    if ((length ($objects[$O]) < 7) && ($O>1)){ #Checking if the lengt
+h of the object name is < 7 characters. All samples are 6 or less. re
+moving the first two elements: "." and ".."
        $CurrDir = $MasterDir."/".$objects[$O]; #appends directory nam
+e to full path
        push (@Dirs, $CurrDir);
    }
}

foreach (@Dirs){
    print $_,"\n";#checks that all directories were read in
}


foreach my $S (0..$#Dirs){
    my @files = ();
    opendir (DIR, $Dirs[$S]) || die "cannot open $Dirs[$S]: $!";
    @files = readdir DIR; #reads in all files in a directory
    closedir DIR;
    my @AbsFiles = ();
    foreach my $F (0..$#files){
        my $AbsFileName = $Dirs[$S]."/".$files[$F]; #appends file name
+ to full path
        push (@AbsFiles, $AbsFileName);
    }

    foreach my $AF (0..$#AbsFiles){
        if ($AbsFiles[$AF] =~ /_R2_001\.fastq$/m){ #finds reverse fast
+q file
            my @readbuffer=();
            #read in reverse fastq
            my %RSeqHash;
            my $c = 0;
            print "Reading, reversing, complimenting, and trimming rev
+erse fastq file $AbsFiles[$AF]\n";
            open (INPUT1, $AbsFiles[$AF]) || die "Can't open file: $!\
+n";
            while (<INPUT1>){
                chomp ($_);
                push(@readbuffer, $_);
                if (@readbuffer == 4) {
                    $rsn = substr($readbuffer[0], 0, 45); #trims rever
+se seq name
                    $cc++ % 10000 == 0 and print "$rsn\n";
                    $RSeqHash{$rsn} = $readbuffer[1];
                @readbuffer = ();
                }
            }
        }
    }
    foreach my $AFx (0..$#AbsFiles){
        if ($AbsFiles[$AFx] =~ /_R1_001\.fastq$/m){ #finds forward fas
+tq file
            print "Reading forward fastq file $AbsFiles[$AFx]\n";
            open (INPUT2, $AbsFiles[$AFx]) || die "Can't open file: $!
+\n";
            my $OutMergeName = $Dirs[$S]."/"."Merged.fasta";
            open (OUT, ">", "$OutMergeName");
            my $cc=0;
            my @readbuffer = ();
            while (<INPUT2>){
                chomp ($_);
                push(@readbuffer, $_);
                if (@readbuffer == 4) {
                    my $fsn = substr($readbuffer[0], 0, 45); #trims fo
+rward seq name
                    #$cc++ % 10000 == 0 and print "$fsn\n$readbuffer[1
+]\n";
                    if ( exists($RSeqHash{$fsn}) ){ #checks to see if 
+forward seq name is present in reverse seq hash
                        print "$fsn was found in Reverse Seq Hash\n";
                        print OUT "$fsn\n$readbuffer[1]\n"; #ACUAL OUT
+PUT FILE IS EMPTY!!!
                    }
                    else {
                        $cc++ % 10000 == 0 and print "$fsn not found i
+n Reverse Seq Hash\n"; #PRINTS THIS FOR EVERY LINE IN INPUT2!!!
                    }
                @readbuffer = ();
                }
            }
            close INPUT1;
            close INPUT2;
            close OUT;
        }
    }
}
[download]

I know that the script works without iterating over folders because if I run a simplified version within just one folder it works including using the REs to find file names. But with this version I just get empty output files. Due to the print functions I inserted in this script, I've determined that Perl cant find the variable $fsn as a key in %RSeqHash from INPUT1. I cant understand why because each file is there and it works when I don't iterate over folders so I know that the keys match. So either there is something simple I am missing or this is some sort of limitation to Perl's memory that I have found. Any help is appreciated!

Comment on Running a script across multiple directories with multiple output files (problems comparing hash key values) Download Code

Replies are listed 'Best First'.
Re: Running a script across multiple directories with multiple output files (problems comparing hash key values) by huck (Prior) on Aug 08, 2017 at 00:33 UTC
I may have went in a wrong direction above. `my %input1 = (); #initialize input1 hash for my $c (0..$#AbsFiles){ if ($AbsFiles[$c] =~ /R2_001\.fastq$/){ open INPUT1 ... ; stuff to set ... $input1{$key1} ... close INPUT1; } } # c for my $c (0..$#AbsFiles){ if ($AbsFiles[$c] =~ /R1_001\.fastq$/$/){ open INPUT2 ... ; ...stuff to test key2 against $input1{$key2}; close INPUT2; } } # c` [download] You were resetting the input1 hash every time you opened a file to test for key1.	[reply] [d/l]
Re: Running a script across multiple directories with multiple output files (problems comparing hash key values) by huck (Prior) on Aug 08, 2017 at 00:22 UTC
You have massive problems with variable scope. A "my" variable only "lives" withing the braces that surround it. Change the top of your file from `#!user/local/perl use Cwd;` [download] to this `#!user/local/perl use strict; use warnings; use Cwd;` [download] And then try to understand what it is telling you. If you still are confused read this Variable Scoping in Perl: the basics. If after that you don not understand how to fix it come back and show us your progress. hint: the key will be to combine your c/d loops into a single loop. something like this `for my $c (0..$#AbsFiles){ my $key1=undef; my $key2=undef; if ($AbsFiles[$c] =~ /R2_001\.fastq$/){ open INPUT1 ... ; ...stuff to set key1; close INPUT1; } if ($AbsFiles[$c] =~ /R1_001\.fastq$/$/){ open INPUT2 ... ; ...stuff to set key2; close INPUT2; } if (defined($key1) && defined($key2} && key1 eq $key2 ) { ... stuff to do when both are set and equal ... } else { ...stuff to do otherwise .. } } # c` [download] Note that i "escaped" the dot in the regexp. an escaped dot will match any character, while "\." matches a dot itself. Note that the close does not include the lessthan/greaterthan signs, those are used to read a file, not to reference it. Also note i closed them inside the same scope i opened them, this tends to be good practice. edit:See below Re: Running a script across multiple directories with multiple output files (problems comparing hash key values)	[reply] [d/l] [select]
Re: Running a script across multiple directories with multiple output files (problems comparing hash key values) by Anonymous Monk on Aug 08, 2017 at 00:29 UTC
Hi, This is the outline of your code `# foreach (@Dirs) # foreach (@Dirs) # for ( 0 .. $#AbsDirs ) # for ( 0 .. $#files ) # for ( 0 .. $#AbsFiles ) # if( $AbsFiles[$c] =~ /R2_001.fastq$/ ) # while(<INPUT1>) # if( @readbuffer == 4 ) # for ( 0 .. $#AbsFiles ) # if( $AbsFiles[$d] =~ /R1_001.fastq$/ ) # while(<INPUT2>) # if( @readbuffer == 4 ) # if( exists( $input1{$key2} ) ) # else` [download] This is how you should write code based on that outline `#!/usr/bin/perl -- use strict; use warnings; use File::Find::Rule qw/ find rule /; use Path::Tiny qw/ path /; my $root = path( grep defined, shift, '.' )->realpath; for my $file ( find( name => qr/R2_001.fastq$/ , in => $root ) ){ OneThing( $file ); } for my $file ( find( name => qr/R1_001.fastq$/, in => $root ) ){ TwoThing( $file ); } exit 0;` [download] After looking closer at your while loops, this is how you should write that `for my $file ( find( name => qr/R2_001.fastq$/, in => $root ) ){ SomeThing( $file ); } exit 0; sub SomeThing { my( $onein) = @_; my $twoin = $onein; $twoin =~ s/R2_001.fastq$/R1_001.fastq/; use Path::Tiny qw/ path /; my $out = path( $onein )->realpath( 'Output.fasta' ); OneTwo( $onein, $twoin , $out ); } sub OneTwo { my( $onein, $twoin, $out ) = @_; if( not path( $twoin )->exists ){ warn qq{Skipping "$onein" because "$twoin" does not exist}; return; } ... }` [download]	[reply] [d/l] [select]