snra_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am using PerlIO::gzip for reading the tar files without extracting those file. But the problem i am facing when i come across some corrupt file.

Basically i am writing a log parsing application , which parses the log files of the system in tgz format. The script hangs and some time throws out of memory when it tries to read some corrupt file.

Is there any way to identify the file is corrupt and if its corrupt , skip that file from opening and parsing.

Thanks !!!
  • Comment on PerlIO::gzip - Handling Corrupt tar file

Replies are listed 'Best First'.
Re: PerlIO::gzip - Handling Corrupt tar file
by almut (Canon) on Sep 28, 2009 at 15:44 UTC

    AFAICT from a quick look at the source and some experimentation, PerlIO::gzip doesn't do integrity checking, so if you can't avoid encountering corrupt files occasionally, you'll probably have to resort to an external tool (even though this is of course suboptimal from a performance point of view...). For example, the good old gzip/gunzip program has a -t/--test option to perform integrity checking.

    Update: you could also try IO::Uncompress::Gunzip intead of the PerlIO layer. It has a Strict option which performs a number of integrity checks...

Re: PerlIO::gzip - Handling Corrupt tar file
by jakobi (Pilgrim) on Sep 28, 2009 at 15:48 UTC

    The out of memory as failure mode is a bit annoying...

    Did you already (if necessary using cygwin on windows) try file, zcat | wc, and zcat | tar tvf - on the bad tarballs?

    Usually this should be enough to pinpoint a gzip format problem, maybe already giving enough hints on how to spot your problem cases with simple perl tests like -s.

    Otherwise it's the usual checking perldoc -m for module settings of interest before the final catch-all of 'use diagnostics' and adding instrumentation/warn-statements all over the place. With `zcat ...` or `tar --to-stdout zxvf TARBALL FILE` as alternatives to using PerlIO.

Re: PerlIO::gzip - Handling Corrupt tar file
by LesleyB (Friar) on Sep 28, 2009 at 15:39 UTC

    Posting an example of the relevant code might help here

    Are you, for instance, using the default layer arguments on PerlIO::gzip ? I do not know, I cannot tell, and I smashed my crystal ball up last week.

    Do you check if the file opened successfully?

      Here is the snippet..

      if ( $log =~ /.tar.gz$/ ) { unless ( open LOG, "<:gzip", $log ){ log_msg (LOG_ERR, "Cannot open $log for read : File may be + corrupt"); next; } }


      I also tried with the argument as lazy. The thing is ,when i try to untar those files manually , i am getting some errors. But the script tries to open those files and read