Re: Naming file handles with variables?
by GrandFather (Saint) on Apr 30, 2009 at 04:41 UTC
|
You really don't want to be managing an unknown number of file handles 'retail'. Instead think of using a collection of handles. The easiest way is probably an array, although depending on the nature of the rest of the task a hash keyed by file name may be a better choice. Consider:
use strict;
use warnings;
my @fileNames = ('file1.txt', 'foo.txt', 'wibble.wav');
my @fileHandles;
for my $filename (@fileNames) {
open $fileHandles[@fileHandles], '<', $filename or die "Can't open
+ $filename: $!";
}
while (@fileHandles) {
for my $file (@fileHandles) {
my $line = <$file>;
if (! defined $line) {
# Hit end of file
close $file or die "File close failed: $!";
$file = undef;
next;
}
# do something with $line
}
@fileHandles = grep {defined} @fileHandles;
}
which opens a bunch of files then enters a loop that reads a line from each file in turn and does something with each line.
True laziness is hard work
| [reply] [d/l] |
|
|
Should that be [$filename] here?
for my $filename (@fileNames) {
open $fileHandles[$filename], '<', $filename or die "Can't open $f
+ilename: $!";
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] [d/l] [select] |
|
|
open my $newFileHandle, ...;
push @fileHandles, $newFileHandle;
to achieve the same effect.
If fileHandles were a hash instead of an array then keying by the file name would be appropriate however.
True laziness is hard work
| [reply] [d/l] [select] |
|
|
|
|
|
Re: Naming file handles with variables?
by CountZero (Bishop) on Apr 30, 2009 at 06:42 UTC
|
I think this will do what you want. It handles any number of files to be opened and the sub-routine will return at each iteration a reference to an array holding the next line of each of these files. When all files are exhausted it returns undef use strict;
my @filenames = qw/one.txt two.txt three.txt/;
my @filehandles;
foreach my $filename (@filenames) {
open my $fh, '<', $filename or die "Could not open $filename; $!";
push @filehandles, $fh;
} ## end foreach my $filename (@filenames)
while (1) {
my $lines_ref = read_lines_parallel(@filehandles);
last unless $lines_ref;
print join '|', @$lines_ref;
print '-' x 20, "\n";
} ## end while (1)
sub read_lines_parallel {
my @filehandles = @_;
my @lines;
foreach (@filehandles) {
push @lines, scalar <$_>;
} ## end foreach (@filehandles)
if ( join '', @lines ) {
return \@lines;
} ## end if ( join '', @lines )
else {
return undef;
} ## end else [ if ( join '', @lines )
} ## end sub read_lines_parallel
Output:first line file 1
|line 1 file 2
|first line file 3
--------------------
second line file 1
|line 2 file 2
|second line file 3
--------------------
third line file 1
||third line file 3
--------------------
||fourth line file 3
--------------------
Update: I do not know what herbs I put in my tea when I wrote something as ugly aswhile (1) {
my $lines_ref = read_lines_parallel(@filehandles);
last unless $lines_ref;
print join '|', @$lines_ref;
print '-' x 20, "\n";
}
Obviously it should bewhile (my $lines_ref = read_lines_parallel(@filehandles)) {
# Do something with $lines_ref or @$lines_ref here
print join '|', @$lines_ref;
print '-' x 20, "\n";
}
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] [select] |
Re: Naming file handles with variables?
by roubi (Hermit) on Apr 30, 2009 at 03:38 UTC
|
'open' allows you to specify a lexical variable for the filehandle name like such:
open(my $fh, "file.txt") or die("Can't open file.txt");
while(my $line = <$fh>) {
# stuff here
}
close $fh;
See Indirect Filehandles | [reply] [d/l] |
Re: Naming file handles with variables?
by citromatik (Curate) on Apr 30, 2009 at 07:09 UTC
|
Another (and less efficient) way, using Tie::File:
use strict;
use warnings;
use Tie::File;
my @fnames = @ARGV;
my @fhandlers;
for my $fname (@fnames) {
tie my @farr, 'Tie::File', $fname or die $!;
push @fhandlers, \@farr;
}
# Traverse the files:
for my $i (0..$#{$fhandlers[0]}){
my @nextvals = map {$_->[$i]} @fhandlers[0..$#fhandlers];
# @nextvals has the next line of each file
}
untie $_ for (@fhandlers);
Beware of possible overheads if the files are very big. (0..$#{$fhandlers[0]}) traverses the whole first file just for know how many lines it has.
Update: Added note about efficiency
| [reply] [d/l] [select] |
Re: Naming file handles with variables?
by whakka (Hermit) on Apr 30, 2009 at 04:08 UTC
|
The standard way to do what you want is to open and read one file at a time, keeping whatever data you care about in variables. The example code you gave (logically) does this - it reads every line from FILE1, then FILE2, then FILE3. This is because || is short-circuited: as long as the first condition is true, the whole expression evaluates true. This keeps happening until the end of the first file, etc. (it's safer to check line existence with defined though).
Instead you could read in from standard input with a simple while ( <> ) { ... } and pipe input to the program from elsewhere. Or you could take a list of filenames as arguments in @ARGV and process them individually:
for my $file ( @ARGV ) {
open my $fh, '<', $file or die "$file: $!";
while ( <$fh> ) {
...
}
close $fh;
}
Perhaps I should ask: is there any particular reason you need all filehandles open at once? | [reply] [d/l] [select] |
|
|
while ($line1 = <FILE1> ||
$line2 = <FILE2> ||
$line3 = <FILE3>) {
#do stuff with each line
}
It says, Can't modify logical or (||) in scalar assignment at pl7.pl line 28, near "<FILE3>) "
Execution of pl7.pl aborted due to compilation errors.
We have to use 'or'
| [reply] [d/l] |
|
|
You can also wrap the assignment in parentheses, but I qualified my statement with "logically" to address what the code was conceptually doing.
| [reply] |
Re: Naming file handles with variables?
by lakshmananindia (Chaplain) on Apr 30, 2009 at 03:42 UTC
|
...I realize that you can open multiple files at one time..
while ($line1 = <FILE1> ||
$line2 = <FILE2> ||
$line3 = <FILE3>) {
#do stuff with each line
}
The above one will not open multiple file. <FILE1> operator will just read from the filehanle FILE1, which has to be opened already by using open.
You can say open $fh,file1and read from that using while(<$fh>)
--Lakshmanan G.
The great pleasure in my life is doing what people say you cannot do.
| [reply] [d/l] [select] |
Re: Naming file handles with variables?
by happy.barney (Friar) on Apr 30, 2009 at 09:15 UTC
|
use IO::File;
@handles = grep defined, map {
new IO::File ($_, 'r')
|| warn ($_, ': cannot open: ', $!)
&& undef
} @file_names;
while (@list = grep defined, map scalar <$_>, @handles) {
for $line (@list) {
# do stuff with line
}
}
| [reply] |
Re: Naming file handles with variables?
by koptons (Initiate) on Apr 30, 2009 at 12:49 UTC
|
Perl has got a command line option '-n' that supports multiple file processing at the same time. Here $_ will have each line. This option is much faster than using a FileHandle for the cases where the number of files are not known. Look for -n option on command line. | [reply] |
|
|
No, not quite - from perlrun, we see:
-n
causes Perl to assume the following loop around your program, which ma
+kes it iterate over filename arguments somewhat like sed -n or awk:..
+.
Since it [the -n option] causes perl to iterate over the files in turn, self-evidently doesn't meet the requirements of the OP.
A user level that continues to overstate my experience :-))
| [reply] [d/l] |
Re: Naming file handles with variables?
by NiJo (Friar) on Apr 30, 2009 at 17:30 UTC
|
Your problem has been solved even before perl existed. 'paste' combines files in a column by column fashion. I'd use bits and pieces like these:
my $command = 'paste ' . join(' ', @ARGV);
open FH, $command .'|';
while <FH> {
@one_line = split "\t", $_;
# do something
}
But I suspect that that your application needs something more efficient, as the presented solutions preseted up to now cause many disk seeks. Ignoring caches there is a seek for each line of each file!
I'd 'slurp' all files into an @array[file_no][line_no] one after another. This is done linear and at once. If the array does not fit into RAM, I'd use Tie::File.
See e. g. http://www.sysarch.com/Perl/slurp_article.html
| [reply] [d/l] |
|
|
However 'Ignoring caches' and and other real world aspects of programming leads to bad decisions. In this case for example caching means that the line at a time technique is likely to scale very well for extremely large files whereas slurping the files is likely to lead to thrashing the hard disk even for fairly modest (by today's standards) file sizes.
In general slurping is a bad design choice and for a line by line task such as indicated by the OP Tie::File is likely to be (at best) little more than syntactic sugar, and at worse may impose significantly more overhead than the multiple file handle solutions already offered.
True laziness is hard work
| [reply] |