Read contents of multiple files into new file and then move source files?

shadowfox has asked for the wisdom of the Perl Monks concerning the following question:

I know there's going to be many approachs to this but using my inital approach I'm not getting the desired results, atleast not entirely.

I want to take a folder, which is typically thousands of files, but they are small files, maybe 1-5 files at most. So I used glob to grab all the .txt files, and then write out the data from all the files into 1 new combined file in another folder. That works ok, 3000 files are read into a new file in about 10 seconds.

My problem comes next where I want to take the files that were read and move them to an archive folder, new files are written to the source folder all the time so I naturally don't want to read the same files over again or move any that were not read yet, so it needs to be the same array of files, for lack of a better grouping term.

In the context of the code below, the move command is only moving the very first file, it does read all files but its only moving the one so I've cleared missed something that's hopfully not terribly obvious. I thought about switching over to readdir for file handle operations, but I wanted to confirm this will/won't work first.

As always thanks in advance for any and all help provided!

# combinedinput.pl
use warnings ;
use strict ;
use File::Copy ;

my $INDIR  = "C:/output/input";
my $OUTDIR  = "C:/output/input/archive";
my $NEWFLE = "COMBINED.output";
my $COUNT  = "0";
my $FILE ="";

print "[Task started on " . scalar localtime . "]\n";
chdir($INDIR);
open (my $FH, '>', "../$NEWFLE") or die "etc: $!";

    @ARGV = glob('*.txt');
    
foreach $FILE (@ARGV) {  

    while (<>) {
        print $FH $_;
        $COUNT++;    
    }
    move($FILE, $OUTDIR) or die "Could not move $FILE to $OUTDIR: $!\n
+";    
}

close ($FH);
print "\n";
print " Found $COUNT files with data to process \n";
print " Reading all file data into memory \n";
print " $NEWFLE generated successfully \n";
print "\n";
print "[Task completed in "; print time - $^T . " seconds] \n" ;
#END
[download]


<output results>

[Task started on Fri Feb 10 13:49:17 2012]

 Found 3012 files with data to process
 Reading all file data into memory
 COMBINED.output generated successfully

[Task completed in 11 seconds]
[download]

Comment on Read contents of multiple files into new file and then move source files? Select or Download Code

Replies are listed 'Best First'.
Re: Read contents of multiple files into new file and then move source files? by Eliya (Vicar) on Feb 10, 2012 at 19:23 UTC
`while (<>) {` [download] reads all the files in @ARGV in one go (emptying @ARGV while doing so), so you get to execute the foreach loop only once... One solution would be to open the files individually using open. Another would be to copy @ARGV into another array, and do the moving afterwards.	[reply] [d/l]
Re^2: Read contents of multiple files into new file and then move source files? by shadowfox (Beadle) on Feb 10, 2012 at 19:56 UTC
Thanks, that’s what I was missing, I didn't realize the array was emptied before the handle was closed. Changing the middle portion of the code as suggested works fine, just needed to save the array in another array for later use. `@ARGV = glob('*.txt'); if (@ARGV){ my @MOVE = @ARGV; while (<>) { print $FH $_; $COUNT++; } foreach $FILE (@MOVE) { move($FILE, $OUTDIR) or die "Could not move $FILE to $ +OUTDIR: $!\n"; } }` [download]	[reply] [d/l]
Re: Read contents of multiple files into new file and then move source files? by admiral_grinder (Pilgrim) on Feb 10, 2012 at 19:50 UTC
here is an approach using a few more modules from CPAN. I think it would make it more expandable in the future or readable 6 months from now. use warnings ; use strict ; use File::Copy ; use Path::Class ; # file(), dir() use File::Find::Rule ; my $INDIR = "C:/output/input"; my $OUTDIR = "C:/output/input/archive"; my $NEWFLE = "COMBINED.output"; my $COUNT = 0; print "[Task started on " . scalar localtime . "]\n"; my $new_file = file( $INDIR, '..', $NEWFLE ); $new_file->touch(); my $FH = $new_file->openw() or die "etc: $!"; my @files = File::Find::Rule->file()->name( qr/.*\.txt/i )->in( $INDIR + ); @files = map{ file( $_ ) } @files; foreach my $file ( @files ) { # Skip file if it is in the archive path. my $archive_path = file( $outdir, $file->basename() ); next if -e $archive_path; # Process file $COUNT++; my @lines = $file->slurp(); print $FH $_ for @lines; # you have to "stringify" Path::Class objects passed to File::Copy move( "$file", "$archive_path" ) or die "Could not move $file to $OU +TDIR: $!\n"; } $FH->close(); print "\n"; print " Found $COUNT files with data to process \n"; print " Reading all file data into memory \n"; print " ${new_file} generated successfully \n"; print "\n"; print "[Task completed in "; print time - $^T . " seconds] \n" ; [download]	[reply] [d/l]
Re: Read contents of multiple files into new file and then move source files? by jdrago999 (Pilgrim) on Feb 10, 2012 at 19:42 UTC
`cat /files/* > new-file.txt && mv /files/* /new-folder/` Ahh but you're on Windows...	[reply] [d/l]
Re^2: Read contents of multiple files into new file and then move source files? by aaron_baugher (Curate) on Feb 10, 2012 at 23:20 UTC
Wondering: Is it possible for a file to be created after the first wildcard is interpreted, but before the second, so a file would get moved that wasn't catted? Aside from being on Windows, he's also talking about thousands of files, so a shell might balk at that many anyway. That'd lead to using find, which done right should eliminate any concerns about moving unprocessed files. Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply]
Re^3: Read contents of multiple files into new file and then move source files? by jdrago999 (Pilgrim) on Feb 11, 2012 at 00:41 UTC
Yes, yes, and yes. You get my ++ upvote across the board. My reply was hasty and without caffeine.	[reply]