Re: Merging files

If you want to only use 1 filehandle you can always use a mixture of open/seek/tell/close to always re-use a single filehandle, see code below. Of course this is going to be really slow, as you will have to perform all 4 operations on every single line of each file.

Having a pool of open file handles would help some: as long as you are merging less files than the size of the pool you merge then the "natural way" (just read one line at a time from each filehandle), and if you need more, then you use the method below for files that you havent in the pool.

Does it make sense?

#!/usr/bin/perl -w
use strict;

use Fatal qw(open close); # so I don't have to bother testing them

my @files= @ARGV;
my %marker= map { $_ => 0 } @files;

while( keys %marker)
  { foreach my $file (@files)
      { if( exists $marker{$file})
          { open( my $fh, '<', $file);
            seek( $fh, $marker{$file}, 0); # the 0 means 'set the new 
+position in bytes to' $marker{$file}
            if( defined( my $line=<$fh>))
              { print $line;
                $marker{$file}= tell $fh;
              }
            else
              { delete $marker{$file}; }
            close $fh;
          }
      }
  }
[download]

Comment on Re: Merging files Download Code

Replies are listed 'Best First'.
Re^2: Merging files by sk (Curate) on Apr 11, 2005 at 08:12 UTC
Wow this is so cool! Thanks Boris/C/Mirod! I intially thought I will just learn how to open multiple filehandles but got lured into implementing the paste command like program (not there yet :)).. Here is my code but for some reason my `$line` variable does not get set to undef when 'all' filehandles runs out of data...Should i be checking for something else to terminate the loop? I guess the code will be much faster if i can generate perl code that puts all the filehandles into one line without looping them through it... `#! /usr/local/bin/perl -w # Open many file handels # C's code foreach my $file (@ARGV) { my $fh; open $fh, "<", $file or die "Can't open $file ($!)"; push @filehandles, $fh; } while (1) { $line = undef; # Not sure whether this is even required. foreach (@filehandles) { $line .= <$_>; chomp($line); } print ($line,"\n"); last if undef($line); }` [download]	[reply] [d/l] [select]
Re^3: Merging files by borisz (Canon) on Apr 11, 2005 at 08:36 UTC
Since the lhs of `.=` is always a defined value, even if you append `undef`. `perl -MData::Dumper -e '$x.= undef; print Dumper($x)' __OUTPUT__ $VAR1 = '';` [download] Boris	[reply] [d/l]
Re^3: Merging files by mirod (Canon) on Apr 11, 2005 at 08:45 UTC
I think you should keep track of which filehandle is still open, and read it only in that case. And I am always wary of `while(1)` loops. But its probably just me ;--) `#!/usr/bin/perl -w use strict; use Fatal qw(open close); my @files= @ARGV; my %fh; # file => file handle foreach my $file (@files) { open( my $fh, '<', $file); $fh{$file}= $fh; } while( keys %fh) { foreach my $file (@files) { if( exists $fh{$file}) { my $fh= $fh{$file}; if( defined( my $line= <$fh>)) { chomp $line; print $line;} # regular line else { delete $fh{$file}; } # eof reached for this file } } print "\n"; # end of line for all files }` [download]	[reply] [d/l]
Re^4: Merging files by sk (Curate) on Apr 11, 2005 at 09:23 UTC
Thanks again! Now i understand the undef issue. Also thanks for the one-liner, showed me how to use Dumper!!! Mirod, I agree, it is definitely better to keep track of filehandles and close them when they run of out data! Learnt a few nice things today :)	[reply]
Re^4: Merging files by cazz (Pilgrim) on Apr 11, 2005 at 14:26 UTC
I agree, while(1) loops can be bad. I'm fond of doing something similar to what you are doing instead of while(1). I have spent much more time than I care to remember debugging code that was due to poor while(1) implementations. I have a few minor nits to pick with your example. You don't close any of your file handles. Why keep em around once you are done? I would inject a close right before your delete Intead of repeatedly checking every file in the list you are processing, why not just check the list of files currently open? The foreach my $file (@files) loop is easy to swap out with a foreach my $fh (keys %fh) The change would look something like this: `while (keys %fh) { foreach my $fh (keys %fh) { if (defined(my $line = <$fh>)) { chomp $line; print $line; } else { close($fh); delete($fh{$fh}); } } }` [download]	[reply] [d/l]