Finding Files and Processing them iteratively

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
Suppose I have a pair-series of files as follows:

data1R.fa
data2P.fa
data2R.fa
data2P.fa #say each of this file contain lines of numbers
....      #and there are 40 of files
[download]

And I have a code that take two files that ends with *R.fa and *P.fa

perl mycode.pl data1R.fa data1P.fa 
perl mycode.pl data2R.fa data2P.fa
[download]

So the code is like this:

use warnings;
use strict;

my $file_r = $ARGV[0];
my $file_r = $ARGV[1];
open FILE_R, "< $file_r" or die "Can't open $file_r : $!";
open FILE_P, "< $file_p" or die "Can't open $file_p : $!";

my @rdata;
my @pdata; #these two are of the same size

while (<FILE_R>){
  chomp;
  push @rdata, $_;
}

while (<FILE_P>){
  chomp;
  push @pdata, $_;
}

my @sum = map {$rdata[$_] + $pdata[$_]} 0..$#pdata;
 
print "$file_r\n";
print ">\n";

foreach my $sum (@sum){

 print "$sum\n";
}
[download]

How can I make my code above such that it can take all multiple files iteratively? to give output sth like this:

>data1
sum_of_elements from data1R.fa and data1P.fa
>data2
sum_of_elements from data2R.fa and data2P.fa
[download]

Regards,
Edward

Update:
Correction has been made to the code above, apologize for it.

Comment on Finding Files and Processing them iteratively Select or Download Code

Replies are listed 'Best First'.
Re: Finding Files and Processing them iteratively by Random_Walk (Prior) on Feb 25, 2005 at 11:28 UTC
If you mean to run the script with a long list of R/P file names then this will do it. Note as the P names are derived from the R names you only need give a list of R names. If the extension is always the same a list of base names would do, should be trivial to get there from here. `#!/usr/bin/perl use strict; use warnings; for (@ARGV) { next if /P\./; # ignore the P files my ($base, $ext)= split /R/; #assume only one 'R' in the file name r_p_process ( $base."R".$ext, $base."P".$ext ) } sub r_p_process { my $r_file=shift; my $p_file=shift; print "processing the pair $r_file $p_file\n"; # do your funky stuff }` [download] Cheers, R. Pereant, qui ante nos nostra dixerunt!	[reply] [d/l]
Re^2: Finding Files and Processing them iteratively by monkfan (Curate) on Feb 25, 2005 at 13:53 UTC
Hi Random Walk, Thanks so much answering. I've tested your code. It works great. Just wonder how can I modify the code, such that it capture the file automatically. Currently I have to pass the all files argument manually like this: `perl mycode.pl data1P.fa data1R.fa data2P.fa data2R.fa ....etc` [download] Because there are many many files like this. Can it be automated? Regards, Edward	[reply] [d/l]
Re: Finding Files and Processing them iteratively by deibyz (Hermit) on Feb 25, 2005 at 12:30 UTC
You can cycle through all the files with that pattern with `glob`. Something like that (untested code): `#!/usr/bin/perl use strict; use warnings; for ( glob data*R.fa ) { my $Rfile = $_; s/R(?=\.fa)$/P/; my $Pfile = $_; # Your code here }` [download]	[reply] [d/l] [select]
Re: Finding Files and Processing them iteratively by virtualsue (Vicar) on Feb 25, 2005 at 12:45 UTC
There are plenty of ways to do this™. You have files which match a set pattern, which lends itself very naturally to the use of glob. `# All "R" data files my @data_r = glob "R.fa"; # All "P" data files my @data_p = glob "P.fa";` [download] You don't necessarily need to glob both filename patterns, as you may not be that worried about detecting problems with the dataset itself. I think I would want to validate this, though. I'd want to know if there were more or fewer dataR.fa vs dataP.fa files, or if the number is the same, but the names themselves didn't match up. You can use List::Compare to find out if the two arrays are the same. You can get the base datafile names like this: `my @base_data = map substr($_, 0, length($_) - 4), @data_r;` [download] From there it is pretty easy to open and process each set of files (data#R.fa, data#P.fa) iteratively.	[reply] [d/l] [select]
Re: Finding Files and Processing them iteratively by rupesh (Hermit) on Feb 25, 2005 at 12:40 UTC
You can also use File::Find or File::Recurse. #!c:\perl\bin\perl.exe use strict; use File::Recurse; use File::Find; my (@filearr, $path, $prod_file, $files, %files, %all); sub recurse { my $path=shift; my $matchpattern=shift; %files = Recurse(["$path"], { match => "$matchpattern", nomatch => '' }); if (scalar keys %all) { @all{keys %files} = values %files; } else { %all=%files; } } { recurse("c:\\data", "\."); foreach (sort keys %all) { my $dirs=$_; foreach (@{ $all{"$_"} }) { $files=$_; my $fullname="$dirs"."\\"."$files"; push @filearr, "$fullname\n"; } } foreach my $machine (@filearr) { chomp; open FH, "<$machine"; my $ctr=0; foreach (<FH>) { chomp; $ctr+=$_; } close FH; $all{$machine}=$ctr; } } [download] Cheers, Rupesh.	[reply] [d/l]
Re^2: Finding Files and Processing them iteratively by satchm0h (Beadle) on Feb 25, 2005 at 13:24 UTC
If you decide to go with File::Find and are Unix find savvy, check out the handy find2perl script. Write a find command line that does what you want and then replace 'find' with 'find2perl' and it will generate the code for you. Although for your specific task, File::Find may be overkill. The various glob solutions others have posted should serve you well.	[reply]
Re: Finding Files and Processing them iteratively by blazar (Canon) on Feb 25, 2005 at 13:02 UTC
`my $file_r = $ARGV[0]; my $file_r = $ARGV[1]; open FILE_R, "< $file_r" or die "Can't open $file_r : $!"; open FILE_P, "< $file_p" or die "Can't open $file_p : $!";` [download] And `$file_p` is?!? `while (<FILE_R>){ chomp; push @rdata, $_; } while (<FILE_R>){ chomp; push @rdata, $_; } my @sum = map {$rdata[$_] + $pdata[$_]} 0..$_#a;` [download] And `@a` is?!? Please note that while it's easy to understand what your code could be supposed to do, the above simply won't compile, and it's advisable to post minimal, but complete, working samples of code. As a side note, since basically you trust your files to hold the same number of entries, you could avoid putting all of their contents into arrays to process them later (this basically amounts to -unnecessary- slurping) and process them sequentially instead...	[reply] [d/l] [select]