shmf has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm new to Perl language and Linux but I would like to ask for your help.
I'm working in Linux and the files I need to process are in the format "file.gz".

I don't know if my problem is due to Perl or Linux itself so I'm very lost at the moment.
The problem arises when I try to decompress and then open about 300 files with my Perl script. (I need to do that because I'm analyzing some logs and some of the data needs to be kept on memory during the process. Uncompressing all files at once is unthinkable because it would require to much disk space.)

First I tried it, doing in my script: "system zcat file.gz Z file" followed by "open FILEHANDLE MODE" with MODE "<".
As it didn't work as I expected, I tried then with "open FILEHANDLE MODE" with MODE "|", which, as you know, makes the filename to be interpreted as a Linux command which pipes output to my script.
The problem found was exactly the same.

I've created a temporary solution, but it is not acceptable because it needs me to do much manual work.

Here's a more detailed description of my problem and my script:

I created a script which should decompress, open and then delete nearly 400 files.
To do that I use the Perl function "open" with the option "|" after the filename.

In the beginning the script works fine, but after about 300 files processed I get an error on Open function:

"dealError: Could not open file /home/Logs/backup/file.1183204803".

and exits.

I changed my script so I could continuing processing, but the solution I found is not acceptable:

I created the function "dealError" and I call it if the "open" function in my "proc" function returns an error.
What my "dealError" function does is, simply, wait for some text to be entered in the standard input. If that text is "GO" then "dealError" tries to open the file which has failed before.
This allows me to, manually, by the linux shell, decompress the problematic file and then type "GO" in my script shell.
If I do that, my script opens the file and continues execution normally but only until it needs to process another file.
When it does need to zcat and open another file the problem will be exactly the same.

If I keep decompressing manually the files in Linux shell and typing "GO" the script will end to terminate it's work correctly, but it is unacceptable for me do all this manual work for about 100 files!! (I've tested it with less files and it worked.)

So the problem is not with the files, since I can decompress them manually. I thougth that maby the problem could be in "open" or "zcat"...

Can you help me?

Thank you in advance.

(PS: Sorry for my bad English.)

Here's, what I think to be, the relevant part of my script:
(Note: Function dealError is at the bottom since it is not very relevant. I submitted it anyway, in case you wish to look at it.) Code: ( perl )

use Cwd; my $path=cwd(); sub proc{ my ($file) = @_; # (...) $pathC = $path . "/$file"; open INPUT, "zcat $pathC|" or dealError($cpath); while($line = <INPUT> ){ eval($line); # (…) data proccessing irrelevant to the problem } close INPUT; } sub init{ (...) #Initializing @lista ("system "ls"" plus a bunch of tests) for my $fileName (@lista){ proc($fileName); } } init; #------------------------------------------------------------------ sub dealError{ my ($path) = @_; open(STDIN, "-"); #Opens standard INPUT print "\n!!ATENTION:\n\tdealError: Could not open file $cpath\n"; print "Check the problem with the file and type GO to continue\n"; while($line = <STDIN>){ if ($line =~ m/GO/){ open INPUT, "$cpath" or dealError($path); close STDIN; last; } else { print "Check the problem with the file and type GO to cont +inue\n"; } } }
  • Comment on Is there a limit of files I can decompress and then open in a script, if I close each one before openning other?
  • Download Code

Replies are listed 'Best First'.
Re: Is there a limit of files I can decompress and then open in a script, if I close each one before openning other?
by ikegami (Patriarch) on Jul 25, 2007 at 14:54 UTC

    Operating systems limit how many files are opened *at a time*, but they don't limit how many files a process can open over its lifetime.

    I don't see anything that would cause your problem, but I do have issues.

    • Use lexical file handles. It will ensure they get cleaned up and closed even if the subroutine exits abnormally.

    • What is open(STDIN, "-");? - means STDIN, so you're opening STDIN to be the same as STDIN?!

    • Also, calling dealError from within dealError is problematic.

    • Finally, what is dealError trying to do? Short of a missing floppy in the drive, I don't know of an error that can be "checked" and fixed.

    I'd recommend:

    sub proc{ my ($file) = @_; # (...) my $pathC = $path . "/$file"; open my $input, "zcat $pathC|" or do { print STDERR "\n!!ATTENTION:\n\tUnable to run zcat: $!\n"; return; }; while (my $line = <$input> ){ # (...) data proccessing irrelevant to the problem } # Not needed. Will happen automatically # when $input goes out of scope. #close($input); }
Re: Is there a limit of files I can decompress and then open in a script, if I close each one before openning other?
by FunkyMonk (Bishop) on Jul 25, 2007 at 14:58 UTC

    I doubt either perl or your OS has a problem opening & closing 300/400 files. So I'm just offering general advice.

    • Your program uses $pathC and $cpath. Is this a typo? Adding use strict; and use warnings; would catch that and many other errors (if you're not using thm already, of course).
    • Perl will tell you why the open failed in the special variable $!.
    • Is there a good reason why you're opening & closing STDIN? I doubt it's necessary.
Re: Is there a limit of files I can decompress and then open in a script, if I close each one before openning other?
by cdarke (Prior) on Jul 25, 2007 at 14:59 UTC
    In addition to the above, when reporting an error from functions like open it is a good idea to include $! in your error message. This will give you the operating system error message and a better idea of what the problem is.
Re: Is there a limit of files I can decompress and then open in a script, if I close each one before openning other?
by djp (Hermit) on Jul 26, 2007 at 03:33 UTC
    Recommend PerlIO::gzip for reading gzipped files, e.g.
    use strict; use warnings; use IO::File; use PerlIO::gzip; my $ifh = IO::File->new( $ifile, '<:gzip' ) or die "Cannot open $ifile: $!\n"; while (<$ifh>) { ... }
Re: Is there a limit of files I can decompress and then open in a script, if I close each one before openning other?
by DrHyde (Prior) on Jul 26, 2007 at 10:03 UTC
    You say you're dealing with log files. And then you try to eval() each line that you read. Don't you think that maybe trying to eval something that is pretty much guaranteed to not be perl code is a Really Bad Idea?
      Well,
      the logs I'm analysing were created by a perl server which simply dumps some Hashs into files every second.(It's not my code)
      So I think I can securely "eval" them.
Re: Is there a limit of files I can decompress and then open in a script, if I close each one before openning other?
by hilitai (Monk) on Jul 26, 2007 at 22:17 UTC
    I agree with the previous message that recommended the use of $!. You need to find out why the open command isn't working. The error message from dealError() isn't telling you anything useful. Try replacing your open line with:
    open INPUT, "zcat $pathC|" or die "Can't open $pathC: $!";