redbeard has asked for the wisdom of the Perl Monks concerning the following question:

I'm attempting to munge together some data from an arbitrary set of files all stored in a single directory. As I have to constantly cross-compare informationi from the data files, iterate through one line of one, but not the others, depending on the results, I thought creating an array of filehandles to later deal with using foreach and while might be a good way to go. However, I'm having some issues dealing with the array of filehandles that (even after reading the wonderful answers here) have made me somewhat puzzled. Namely, for the code:
my @nbdc_filehandles; my @nbdc_data; my $index=0; foreach my $file (@nbdc_files){ open(FILE, "./$nbdc_dir/$file") || die "Can't open ./$nbdc_dir/$fi +le"; $nbdc_filehandles[$index] = *FILE; #get the first set of data values < $nbdc_filehandles[$index] >; < $nbdc_filehandles[$index] >; chomp; #get rid of trailing \n $nbdc_data[$index]=\split(/\t/, $_); $index++; }
Which merely opens each file, then iterates to the 2nd line (to get rid of the header), and then chops up the second line of data, I get the error
Use of uninitialized value in scalar chomp at ./temp_merge.pl line 46. Use of uninitialized value in split at ./temp_merge.pl line 47.
I know it's opening the files properly, as there is not death, and a later usage of the filename is fine, it's just accessing the data. Any thoughts?

Replies are listed 'Best First'.
Re: Using a filehandle tucked into an array
by almut (Canon) on Dec 30, 2006 at 16:21 UTC

    Couple of things :)

    (1) You need to localize you global filehandle FILE. Otherwise all filehandles stored away in the array will refer to the most recently opened file...

    local *FILE; open FILE, "<", "./$nbdc_dir/$file" or die ... ...

    Or, even better, make use of the feature of more modern versions of Perl to accept a lexical variable in the open() statement:

    open my $fh, "<", "./$nbdc_dir/$file" or die ... $nbdc_filehandles[$index] = $fh;

    or even simply (if @nbdc_filehandles is lexical)

    open $nbdc_filehandles[$index], "<", "./$nbdc_dir/$file" or die ...

    The diamond operator is somewhat "special" syntactically, in that

    (2) you need to make an explicit assignment if you use it outside of loops:

     my $line = <$fh>;

    or

     $_ = <$fh>;

    (3) no whitespace is allowed within the angular brackets:

    $_ = <$fh>; # OK $_ = < $fh >; # not OK

    (4) and, as jettero and BrowserUK pointed out, it doesn't accept array expressions, so you have to use an intermediate flat scalar, or readline():

    my $fh = $nbdc_filehandles[$index]; $_ = <$fh>; # instead of # $_ = <$nbdc_filehandles[$index]>;

    So, your code would look like

    my @nbdc_filehandles; my @nbdc_data; my $index=0; foreach my $file (@nbdc_files) { open my $fh, "<", "./$nbdc_dir/$file" or die "Can't open ./$nbdc_d +ir/$file"; $nbdc_filehandles[$index] = $fh; #get the first set of data values $_ = <$fh>; $_ = <$fh>; chomp; #get rid of trailing \n $nbdc_data[$index] = [ split(/\t/, $_) ]; # ... $index++; } # then sometime later, to reuse the stored filehandles my $fh = $nbdc_filehandles[$idx]; my $line = <$fh>; # ...
Re: Using a filehandle tucked into an array
by BrowserUk (Patriarch) on Dec 30, 2006 at 15:51 UTC

    When you use a non-scalar (array or hash element) to hold a filehandle, you have to change your use of the diamond operator to readline:

    readline( $nbdc_filehandles[$index] ); readline( $nbdc_filehandles[$index] );

    In some places, it's possible to disambiguate the use of a non-scalar filehandle using an anonymous block. For example, when using one as an indirect object for printing:

    print { $arrayOfFileHandles[ $n } } 'some data';

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Using a filehandle tucked into an array
by bsdz (Friar) on Dec 30, 2006 at 15:24 UTC
    You could avoid the filehandle headaches by keeping an array of references to Tie::File arrays. For example: -
    use strict; use Tie::File; my $nbdc_dir = '.'; my @nbdc_files = qw(1.txt 2.txt 3.txt); my @nbdc_tied_files; my @nbdc_data; my $index=0; foreach my $file (@nbdc_files) { tie my @tmparray, 'Tie::File', "./$nbdc_dir/$file" or die "Can't open ./$nbdc_dir/$file"; push @nbdc_tied_files, \@tmparray; #get the first set of data values $nbdc_tied_files[0][0]; $_ = $nbdc_tied_files[0][1]; $nbdc_data[$index] = split(/\t/, $_); $index++; }
    Update: Set $_ to value before split. Though I still don't understand what the reference to split output is for.
Re: Using a filehandle tucked into an array
by jettero (Monsignor) on Dec 30, 2006 at 14:39 UTC

    You're almost there. I'd try something like this: open( $nbdc_filehandles[$index], $filename) or die $!. But $nbdc_filehandles[$index] = \*FILE should work also. I always have trouble reading from an array index though. There's probably a better way to do it, but I end up using my $scalar = $a[0]; my $line = <$scalar>

    UPDATE: perlapio always turns out to be a much stranger animal than I ever think. It appears that you can push either \*FILE or *FILE and both work just fine (even though they are different) inside the <>'s — which don't appear to be operators the way I'm used to thinking of them.

    -Paul

      I indeed tried both of those, solutions - actually, putting the array into the open statement had been the original way the code was written, but it did thre the same error. So I changed it to the * (and tried \*) - again, no dice, same problem. Hrm.

        Right, but my post was a two parter. The first part was the (wrong) suggestion to try \*FILE. The second part was to suggest $f=$a[0]; <$f>; because that's the only way I ever get it to work without using IO::Handle or something.

        -Paul

Re: Using a filehandle tucked into an array
by johngg (Canon) on Dec 30, 2006 at 19:27 UTC
    I think I would do this using IO::File from the outset and would store information in a ref. to a HoH keyed by filename rather than in an array. I would start with a list of files to be processed in an array and would populate the hash with "handle" and "buffer" key/value pairs accessed via subroutines to keep it all in step. This way I have the advantage (if needed) to work with files both by name and by their index in a list of files.

    In the script below I am reading four data files, spw592341.dataA to D, each containing a few lines that can easily be identified. Here's the script

    use strict; use warnings; use Data::Dumper; use IO::File; print qq{Files to read\n}; my @dataFiles = glob q{spw592341.data*}; print qq{ $_\n} for @dataFiles; print qq{\n}; my $rhFiles = {}; foreach my $file (@dataFiles) { openFile($file); getLine($file); } showData(q{After reading first line}); foreach my $file (@dataFiles) { getLine($file); } showData(q{All files read again}); getLine($dataFiles[2]); showData(q{Read third file in list}); getLine(q{spw592341.dataD}); showData(q{Read spw592341.dataD}); foreach my $file (@dataFiles) { closeFile($file); } showData(q{After closing all files}); sub closeFile { my $file = shift; $rhFiles->{$file}->{handle}->close() or die qq{close: $file: $!\n}; delete $rhFiles->{$file}->{handle}; } sub getLine { my $file = shift; $rhFiles->{$file}->{buffer} = $rhFiles->{$file}->{handle}->getline(); chomp $rhFiles->{$file}->{buffer}; } sub openFile { my $file = shift; my $fh = IO::File->new($file, O_RDONLY) or die qq{open: $file: $!\n}; $rhFiles->{$file}->{handle} = $fh; } sub showData { my $msg = shift; print qq{\n$msg\n}; my $dd = Data::Dumper->new( [$rhFiles], [qw{rhFiles}])->Indent(1); print $dd->Dumpxs(); }

    The output is just Data::Dumper output of the hash ref. used to hold the data.

    I hope this is of use.

    Cheers,

    JohnGG

      It's off-topic, but I'm really curious about your use of quoting constructs in place of quoted strings. What's your reasoning here? I've never seen it before, and I assume there's an interesting reason.

        My background is *nix, specifically SunOS/Solaris, and I had been quite happy using normal quoted strings for years. Then I installed Active Perl on a PC at home but really struggled to write one-liners because of the quoting conventions on MS Windows. When I found out about q{...} and qq{...} and applied them in one-liners they made life easy again. I have now got into the habit of using them all the time so that I don't have to make adjustments when I move between systems.

        That's the only reason really. They involve a little more typing and they are not as familiar to most as quoted strings but, for me, they are more convenient.

        Cheers,

        JohnGG

        i do this too a lot. i just feel it's superior. for one, you so often need to use " or ' in your strings, and you're free to this way, your trade is that you can't use unbalanced forms of your quote marker (so, if you use [ as in qq[] you can't write qq[ :[ = sad face ], that's a syntax error), but in the rare case you must use unbalanced brackets in a in a string, it's probably only of one kind, so just qq with another. there's something good about forcing balance too, i'd use balanced double quotes if we had them (on the same level of accessibility)

        another argument is if you want to change your double or single quote status for some reason you just have to change the "front" of the string, usually easier to find, and obviously, also half as much to search for.


        It's not what you look like, when you're doin' what you’re doin'.
        It's what you’re doin' when you’re doin' what you look like you’re doin'!
             - Charles Wright & the Watts 103rd Street Rhythm Band, Express yourself
Re: Using a filehandle tucked into an array
by Moron (Curate) on Jan 02, 2007 at 16:12 UTC
    There are two ways i do this depending on Perl version. For older versions I just use a reference to the filehandle glob instead of the glob itself, i.e.:
    foreach my $file (@nbdc_files){ open \*FILE, "./$nbdc_dir/$file" || die "Can't open ./$nbdc_dir/$f +ile"; push @nbdc_filehandles, \*FILE; # etc. }
    This is necessary to get open to write the fh back, although these days a more common idiom for the same thing is:
    foreach my $file (@nbdc_files){ open my $fh, "./$nbdc_dir/$file" || die "Can't open ./$nbdc_dir/$f +ile"; push @nbdc_filehandles, $fh; # etc. }

    -M

    Free your mind