in reply to processing a lot of files

I think you want something like this (untested):

my $dir = 'C:/Documents and Settings/mydir/Desktop/current/Test_Files' +; open my $out, '>', "$dir/data.txt" or die "can't open out file: $!"; opendir my $dh, $dir or die "can't opendir $dir : $!"; while(my $f = readdir($dh)) { next if ($f eq 'data.txt'); open my $fh, '<', "$dir/$f" or die "can't open file $f : $!"; my $first_line = <$fh>; while (my $line = <$fh>) { chomp $line; my ($well,$sample,$barcode,$block_id) = split(/\t/, $line); my $name = substr($block_id, 11); $sample =~ /(\d+)(.*)/; print $outfile "$well\t$1\t$2\$barcode\t$name\n"; } close $fh; } closedir $dh; close $out;

Notes:

Replies are listed 'Best First'.
Re^2: processing a lot of files
by lomSpace (Scribe) on Jul 28, 2009 at 21:25 UTC
    You pointed me in the right direction! The script works even
    though I get that annoying "use of uninitialized value of $1 and $2
    in concatenation(.) or string at my print $out statement line.
    Should I be concerned about that?
    my $dir = 'C:/Documents and Settings/mydir/Desktop/current/Test_Files' +; # directory to search opendir my $dh, "$dir"; my $i=1; while(my $f = readdir($dh)) { next if -d "$dir/$f"; open(my $in, "$dir/$f"); open(my $out, ">C:/Documents and Settings/mydir/Desktop/current/T +est_Files/outfiles/data$i"); my $firstline = <$in>; chomp $firstline; while(my $line = <$in>){ chomp $line; my ($well_position,$sample,$barcode,$block_id) = split(/\t/, $ +line); my $name = substr($block_id, 11); $sample =~ /(\d+)(.*)|(\D\d))/; print $out "$well_position\t$1\t$2\t$barcode\t$name\n"; } $i++; close($in); close($out); } closedir($dh);
    Thanks!
    LomSpace
      I get that annoying "use of uninitialized value of $1 and $2 in concatenation(.) or string at my print $out statement line. Should I be concerned about that?
      If you are asking this question, that means there is something that you do not understand about your code. Yes, you should be concerned, and yes, you should try to determine the root cause of the warning.

      As an aside, you should always check the results of each open and opendir:

      opendir my $dh, $dir or "Can not open directory $dir: $!";

      I'll bet you have lines in your data files that don't match the regex, so the $1 and $2 values are undefined, then you try to use them in the print statement. One solution is to simply cleanup the input files before running the script (i.e., make sure there are no files in the input directory other than files you want the script to process). The other possibility is that your data has junk in it - maybe blank lines or comment lines? If so, you simply need to check for those and skip them as needed.

      while(my $line = <$in>) { chomp $line; $line =~ s/^\s+//; # strip leading whitespace next unless $line; # skip blank lines next if ($line =~ /^#/; # skip comment line ... }

      Another thing you can do is this:

      my ($x, $y) = $sample =~ /(\d+)(.*)/; $x = '?' unless $x; $y = '?' unless $y;

      This saves the regex matches into variables, so you don't have to use the special vars $1 and $2 anymore, and you can test them, give them default values, etc.