in reply to Merging files

If you want to only use 1 filehandle you can always use a mixture of open/seek/tell/close to always re-use a single filehandle, see code below. Of course this is going to be really slow, as you will have to perform all 4 operations on every single line of each file.

Having a pool of open file handles would help some: as long as you are merging less files than the size of the pool you merge then the "natural way" (just read one line at a time from each filehandle), and if you need more, then you use the method below for files that you havent in the pool.

Does it make sense?

#!/usr/bin/perl -w use strict; use Fatal qw(open close); # so I don't have to bother testing them my @files= @ARGV; my %marker= map { $_ => 0 } @files; while( keys %marker) { foreach my $file (@files) { if( exists $marker{$file}) { open( my $fh, '<', $file); seek( $fh, $marker{$file}, 0); # the 0 means 'set the new +position in bytes to' $marker{$file} if( defined( my $line=<$fh>)) { print $line; $marker{$file}= tell $fh; } else { delete $marker{$file}; } close $fh; } } }

Replies are listed 'Best First'.
Re^2: Merging files
by sk (Curate) on Apr 11, 2005 at 08:12 UTC
    Wow this is so cool! Thanks Boris/C/Mirod!

    I intially thought I will just learn how to open multiple filehandles but got lured into implementing the paste command like program (not there yet :)).. Here is my code but for some reason my  $line variable does not get set to undef when 'all' filehandles runs out of data...Should i be checking for something else to terminate the loop?

    I guess the code will be much faster if i can generate perl code that puts all the filehandles into one line without looping them through it...

    #! /usr/local/bin/perl -w # Open many file handels # C's code foreach my $file (@ARGV) { my $fh; open $fh, "<", $file or die "Can't open $file ($!)"; push @filehandles, $fh; } while (1) { $line = undef; # Not sure whether this is even required. foreach (@filehandles) { $line .= <$_>; chomp($line); } print ($line,"\n"); last if undef($line); }
      Since the lhs of .= is always a defined value, even if you append undef.
      perl -MData::Dumper -e '$x.= undef; print Dumper($x)' __OUTPUT__ $VAR1 = '';
      Boris

      I think you should keep track of which filehandle is still open, and read it only in that case. And I am always wary of while(1) loops. But its probably just me ;--)

      #!/usr/bin/perl -w use strict; use Fatal qw(open close); my @files= @ARGV; my %fh; # file => file handle foreach my $file (@files) { open( my $fh, '<', $file); $fh{$file}= $fh; } while( keys %fh) { foreach my $file (@files) { if( exists $fh{$file}) { my $fh= $fh{$file}; if( defined( my $line= <$fh>)) { chomp $line; print $line;} # regular line else { delete $fh{$file}; } # eof reached for this file } } print "\n"; # end of line for all files }
        Thanks again! Now i understand the undef issue. Also thanks for the one-liner, showed me how to use Dumper!!!

        Mirod, I agree, it is definitely better to keep track of filehandles and close them when they run of out data!

        Learnt a few nice things today :)

        I agree, while(1) loops can be bad. I'm fond of doing something similar to what you are doing instead of while(1). I have spent much more time than I care to remember debugging code that was due to poor while(1) implementations.

        I have a few minor nits to pick with your example.

        1. You don't close any of your file handles. Why keep em around once you are done? I would inject a close right before your delete
        2. Intead of repeatedly checking every file in the list you are processing, why not just check the list of files currently open? The foreach my $file (@files) loop is easy to swap out with a foreach my $fh (keys %fh)

        The change would look something like this:

        while (keys %fh) { foreach my $fh (keys %fh) { if (defined(my $line = <$fh>)) { chomp $line; print $line; } else { close($fh); delete($fh{$fh}); } } }