Naming file handles with variables?

sinee has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Naming file handles with variables? by GrandFather (Saint) on Apr 30, 2009 at 04:41 UTC
You really don't want to be managing an unknown number of file handles 'retail'. Instead think of using a collection of handles. The easiest way is probably an array, although depending on the nature of the rest of the task a hash keyed by file name may be a better choice. Consider: `use strict; use warnings; my @fileNames = ('file1.txt', 'foo.txt', 'wibble.wav'); my @fileHandles; for my $filename (@fileNames) { open $fileHandles[@fileHandles], '<', $filename or die "Can't open + $filename: $!"; } while (@fileHandles) { for my $file (@fileHandles) { my $line = <$file>; if (! defined $line) { # Hit end of file close $file or die "File close failed: $!"; $file = undef; next; } # do something with $line } @fileHandles = grep {defined} @fileHandles; }` [download] which opens a bunch of files then enters a loop that reads a line from each file in turn and does something with each line. True laziness is hard work	[reply] [d/l]
Re^2: Naming file handles with variables? by QM (Parson) on May 01, 2009 at 03:53 UTC
Should that be `[$filename]` here? `for my $filename (@fileNames) { open $fileHandles[$filename], '<', $filename or die "Can't open $f +ilename: $!";` [download] -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l] [select]
Re^3: Naming file handles with variables? by GrandFather (Saint) on May 01, 2009 at 05:19 UTC
I don't thinks so. `open $fileHandles[@fileHandles], ...` 'creates' a new array element for the new file handle. You could instead: `open my $newFileHandle, ...; push @fileHandles, $newFileHandle;` [download] to achieve the same effect. If fileHandles were a hash instead of an array then keying by the file name would be appropriate however. True laziness is hard work	[reply] [d/l] [select]
Re^4: Naming file handles with variables? by QM (Parson) on May 01, 2009 at 14:09 UTC
Re^5: Naming file handles with variables? by GrandFather (Saint) on May 01, 2009 at 22:15 UTC
Some notes below your chosen depth have not been shown here
Re: Naming file handles with variables? by CountZero (Bishop) on Apr 30, 2009 at 06:42 UTC
I think this will do what you want. It handles any number of files to be opened and the sub-routine will return at each iteration a reference to an array holding the next line of each of these files. When all files are exhausted it returns `undef` use strict; my @filenames = qw/one.txt two.txt three.txt/; my @filehandles; foreach my $filename (@filenames) { open my $fh, '<', $filename or die "Could not open $filename; $!"; push @filehandles, $fh; } ## end foreach my $filename (@filenames) while (1) { my $lines_ref = read_lines_parallel(@filehandles); last unless $lines_ref; print join '\|', @$lines_ref; print '-' x 20, "\n"; } ## end while (1) sub read_lines_parallel { my @filehandles = @_; my @lines; foreach (@filehandles) { push @lines, scalar <$_>; } ## end foreach (@filehandles) if ( join '', @lines ) { return \@lines; } ## end if ( join '', @lines ) else { return undef; } ## end else [ if ( join '', @lines ) } ## end sub read_lines_parallel [download] Output: `first line file 1 \|line 1 file 2 \|first line file 3 -------------------- second line file 1 \|line 2 file 2 \|second line file 3 -------------------- third line file 1 \|\|third line file 3 -------------------- \|\|fourth line file 3 --------------------` [download] Update: I do not know what herbs I put in my tea when I wrote something as ugly as `while (1) { my $lines_ref = read_lines_parallel(@filehandles); last unless $lines_ref; print join '\|', @$lines_ref; print '-' x 20, "\n"; }` [download] Obviously it should be `while (my $lines_ref = read_lines_parallel(@filehandles)) { # Do something with $lines_ref or @$lines_ref here print join '\|', @$lines_ref; print '-' x 20, "\n"; }` [download] CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re: Naming file handles with variables? by roubi (Hermit) on Apr 30, 2009 at 03:38 UTC
'open' allows you to specify a lexical variable for the filehandle name like such: `open(my $fh, "file.txt") or die("Can't open file.txt"); while(my $line = <$fh>) { # stuff here } close $fh;` [download] See Indirect Filehandles	[reply] [d/l]
Re: Naming file handles with variables? by citromatik (Curate) on Apr 30, 2009 at 07:09 UTC
Another (and less efficient) way, using Tie::File: `use strict; use warnings; use Tie::File; my @fnames = @ARGV; my @fhandlers; for my $fname (@fnames) { tie my @farr, 'Tie::File', $fname or die $!; push @fhandlers, \@farr; } # Traverse the files: for my $i (0..$#{$fhandlers[0]}){ my @nextvals = map {$_->[$i]} @fhandlers[0..$#fhandlers]; # @nextvals has the next line of each file } untie $_ for (@fhandlers);` [download] Beware of possible overheads if the files are very big. `(0..$#{$fhandlers[0]})` traverses the whole first file just for know how many lines it has. Update: Added note about efficiency citromatik	[reply] [d/l] [select]
Re: Naming file handles with variables? by whakka (Hermit) on Apr 30, 2009 at 04:08 UTC
The standard way to do what you want is to open and read one file at a time, keeping whatever data you care about in variables. The example code you gave (logically) does this - it reads every line from FILE1, then FILE2, then FILE3. This is because `\|\|` is short-circuited: as long as the first condition is true, the whole expression evaluates true. This keeps happening until the end of the first file, etc. (it's safer to check line existence with `defined` though). Instead you could read in from standard input with a simple `while ( <> ) { ... }` and pipe input to the program from elsewhere. Or you could take a list of filenames as arguments in `@ARGV` and process them individually: `for my $file ( @ARGV ) { open my $fh, '<', $file or die "$file: $!"; while ( <$fh> ) { ... } close $fh; }` [download] Perhaps I should ask: is there any particular reason you need all filehandles open at once?	[reply] [d/l] [select]
Re^2: Naming file handles with variables? by vinoth.ree (Monsignor) on Apr 30, 2009 at 04:34 UTC
If I use, `while ($line1 = <FILE1> \|\| $line2 = <FILE2> \|\| $line3 = <FILE3>) { #do stuff with each line }` [download] It says, Can't modify logical or (\|\|) in scalar assignment at pl7.pl line 28, near "<FILE3>) " Execution of pl7.pl aborted due to compilation errors. We have to use 'or' Vinoth,G	[reply] [d/l]
Re^3: Naming file handles with variables? by whakka (Hermit) on Apr 30, 2009 at 04:38 UTC
You can also wrap the assignment in parentheses, but I qualified my statement with "logically" to address what the code was conceptually doing.	[reply]
Re: Naming file handles with variables? by lakshmananindia (Chaplain) on Apr 30, 2009 at 03:42 UTC
...I realize that you can open multiple files at one time.. `while ($line1 = <FILE1> \|\| $line2 = <FILE2> \|\| $line3 = <FILE3>) { #do stuff with each line }` [download] The above one will not open multiple file. <FILE1> operator will just read from the filehanle FILE1, which has to be opened already by using open. You can say `open $fh,file1`and read from that using `while(<$fh>)` --Lakshmanan G. The great pleasure in my life is doing what people say you cannot do.	[reply] [d/l] [select]
Re: Naming file handles with variables? by happy.barney (Friar) on Apr 30, 2009 at 09:15 UTC
use IO::File; @handles = grep defined, map { new IO::File ($_, 'r') \|\| warn ($_, ': cannot open: ', $!) && undef } @file_names; while (@list = grep defined, map scalar <$_>, @handles) { for $line (@list) { # do stuff with line } }	[reply]
Re: Naming file handles with variables? by koptons (Initiate) on Apr 30, 2009 at 12:49 UTC
Perl has got a command line option '-n' that supports multiple file processing at the same time. Here $_ will have each line. This option is much faster than using a FileHandle for the cases where the number of files are not known. Look for -n option on command line.	[reply]
Re^2: Naming file handles with variables? by Bloodnok (Vicar) on Apr 30, 2009 at 13:43 UTC
No, not quite - from perlrun, we see: `-n causes Perl to assume the following loop around your program, which ma +kes it iterate over filename arguments somewhat like sed -n or awk:.. +.` [download] Since it [the -n option] causes perl to iterate over the files in turn, self-evidently doesn't meet the requirements of the OP. A user level that continues to overstate my experience :-))	[reply] [d/l]
Re: Naming file handles with variables? by NiJo (Friar) on Apr 30, 2009 at 17:30 UTC
Your problem has been solved even before perl existed. 'paste' combines files in a column by column fashion. I'd use bits and pieces like these: `my $command = 'paste ' . join(' ', @ARGV); open FH, $command .'\|'; while <FH> { @one_line = split "\t", $_; # do something }` [download] But I suspect that that your application needs something more efficient, as the presented solutions preseted up to now cause many disk seeks. Ignoring caches there is a seek for each line of each file! I'd 'slurp' all files into an @array[file_no][line_no] one after another. This is done linear and at once. If the array does not fit into RAM, I'd use Tie::File. See e. g. http://www.sysarch.com/Perl/slurp_article.html	[reply] [d/l]
Re^2: Naming file handles with variables? by GrandFather (Saint) on May 01, 2009 at 07:38 UTC
However 'Ignoring caches' and and other real world aspects of programming leads to bad decisions. In this case for example caching means that the line at a time technique is likely to scale very well for extremely large files whereas slurping the files is likely to lead to thrashing the hard disk even for fairly modest (by today's standards) file sizes. In general slurping is a bad design choice and for a line by line task such as indicated by the OP Tie::File is likely to be (at best) little more than syntactic sugar, and at worse may impose significantly more overhead than the multiple file handle solutions already offered. True laziness is hard work	[reply]