Re: processing a lot of files
by scorpio17 (Canon) on Jul 28, 2009 at 19:19 UTC
|
my $dir = 'C:/Documents and Settings/mydir/Desktop/current/Test_Files'
+;
open my $out, '>', "$dir/data.txt" or die "can't open out file: $!";
opendir my $dh, $dir or die "can't opendir $dir : $!";
while(my $f = readdir($dh)) {
next if ($f eq 'data.txt');
open my $fh, '<', "$dir/$f" or die "can't open file $f : $!";
my $first_line = <$fh>;
while (my $line = <$fh>) {
chomp $line;
my ($well,$sample,$barcode,$block_id) = split(/\t/, $line);
my $name = substr($block_id, 11);
$sample =~ /(\d+)(.*)/;
print $outfile "$well\t$1\t$2\$barcode\t$name\n";
}
close $fh;
}
closedir $dh;
close $out;
Notes:
- readdir lets you loop over all the files in a directory, but you still need to open and read each file.
- It's not good to write your output into the same place all your input files are - your script will try to read it too!
- Be careful using the special variable $_ - this is the "default" output of many operations, so it's easy for one to clobber another. I like to store lines read from a file into a variable, just to be safe.
- The special variables $1 and $2 (regex matches) are also easy to clobber (what if you add another regex to your script sometime in the future?) So either store them in other variables right after the regex, or else use them immediately.
| [reply] [d/l] |
|
|
You pointed me in the right direction! The script works even though I get that annoying "use of uninitialized value of $1 and $2 in concatenation(.) or string at my print $out statement line.
Should I be concerned about that?
my $dir = 'C:/Documents and Settings/mydir/Desktop/current/Test_Files'
+;
# directory to search
opendir my $dh, "$dir";
my $i=1;
while(my $f = readdir($dh)) {
next if -d "$dir/$f";
open(my $in, "$dir/$f");
open(my $out, ">C:/Documents and Settings/mydir/Desktop/current/T
+est_Files/outfiles/data$i");
my $firstline = <$in>;
chomp $firstline;
while(my $line = <$in>){
chomp $line;
my ($well_position,$sample,$barcode,$block_id) = split(/\t/, $
+line);
my $name = substr($block_id, 11);
$sample =~ /(\d+)(.*)|(\D\d))/;
print $out "$well_position\t$1\t$2\t$barcode\t$name\n";
}
$i++;
close($in);
close($out);
}
closedir($dh);
Thanks!
LomSpace | [reply] [d/l] |
|
|
opendir my $dh, $dir or "Can not open directory $dir: $!";
| [reply] [d/l] |
|
|
I'll bet you have lines in your data files that don't match the regex, so the $1 and $2 values are undefined, then you try to use them in the print statement. One solution is to simply cleanup the input files before running the script (i.e., make sure there are no files in the input directory other than files you want the script to process). The other possibility is that your data has junk in it - maybe blank lines or comment lines? If so, you simply need to check for those and skip them as needed.
while(my $line = <$in>) {
chomp $line;
$line =~ s/^\s+//; # strip leading whitespace
next unless $line; # skip blank lines
next if ($line =~ /^#/; # skip comment line
...
}
Another thing you can do is this:
my ($x, $y) = $sample =~ /(\d+)(.*)/;
$x = '?' unless $x;
$y = '?' unless $y;
This saves the regex matches into variables, so you don't have to use the special vars $1 and $2 anymore, and you can test them, give them default values, etc.
| [reply] [d/l] [select] |
Re: processing a lot of files
by toolic (Bishop) on Jul 28, 2009 at 19:41 UTC
|
In addition to the suggestions made by others, keep in mind that readdir will also return sub-directory names, as well as file names. You may need to filter out directories using -X:
while(my $f = readdir($dh)){
next if -d "$dir/$f";
| [reply] [d/l] |
|
|
That is a good look Toolic!
Thanks!
LomSpace
| [reply] |
Re: processing a lot of files
by SuicideJunkie (Vicar) on Jul 28, 2009 at 17:16 UTC
|
If something isn't what you expect, print the values and use them to trace along.
Try printing the values where you expect them to be set, and print the source you expect them to be set from.
For example, you set:
$sample = $fields[1];
but you have no check to see how many fields you actually found. Sample is undef? Then $fields[1] was undef, which implies in turn that there was no \t in your $_. | [reply] [d/l] [select] |
|
|
This is not clear, particularly '$_'. I can process using open, but I run into problems with opendir and readdir. I want to change the format of the files.
Still stuck
}
| [reply] [d/l] |
|
|
try the following on the command line
perl -e 'opendir DIR, "./";@directory =readdir(DIR);for $entry (@direc
+tory){print "$entry\n";}'
in your while loop in the failing example $f is a string with the name of a file in it, you have to open, process and close the file inside the loop | [reply] [d/l] [select] |
Re: processing a lot of files
by Utilitarian (Vicar) on Jul 28, 2009 at 17:13 UTC
|
What value have $f and $_ in these snippets?
The warnings are true, you have failed to allow for the changes to the operation of your loop. | [reply] [d/l] |