Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

File integrity checker

by roperl (Beadle)
on Aug 24, 2017 at 16:03 UTC ( #1197939=perlquestion: print w/replies, xml ) Need Help??

roperl has asked for the wisdom of the Perl Monks concerning the following question:

Is there any way in perl to check integrity of gz and zip files without calling external gz or unzip?
The gzip and unzip binaries have a -t option to allow checking integrity.
Currently using Archive::Zip for unzipping and IO::Uncompress::Gunzip for uncompressing gz files. Would like to check the integrity of the file before attempting uncompression or extraction. I'm already checking the file is the correct type by using File:Type and checking the mime type. This will ensure the file is of the correct type but won't check for corruption of the file past the header.
Any help here would be appreciated
my $ft = File::Type->new(); my $type = $ft->mime_type($filename); if ( $type eq "application/zip" ) { ..... } if ( $type eq "application/x-gzip" ) { .... }
Options for gz and unzip
$ gzip -h Usage: gzip [OPTION]... [FILE]... Compress or uncompress FILEs (by default, compress FILES in-place). ..... -t, --test test compressed file integrity .... $ unzip -h ... Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d +exdir] Default action is to extract files in list, except those in xlist, t +o exdir; file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for us +age). ... -f freshen existing files, create none -t test compressed archi +ve data ..

Replies are listed 'Best First'.
Re: File integrity checker
by no_slogan (Deacon) on Aug 24, 2017 at 16:21 UTC
    Would like to check the integrity of the file before attempting uncompression or extraction.
    There's not really any way to do this. Zip and gzip store checksums of the uncompressed contents; the -t options actually uncompress the file, calculate the checksum, and throw away the contents. You could do the same thing in your program, or you could uncompress to a tempfile and delete it if the checksum comes out bad.

    Edit: There's a section about integrity checking in the Archive::Zip::FAQ.

      Thanks, yes the integrity check section in Archive::Zip::FAQ seems to be exactly what I need for zip files
      I'm already gunzipping the gz files to a temp file name. So how do would I get the the checksum of the uncompressed file and what do I compare it to?

        The "checksum" is the same as the "CRC" referenced in the FAQ. The FAQ mentions example code in the CPAN distribution that shows how to calculate the CRC.

        IO::Uncompress::Gunzip will check the sum for you if you set the Strict option:
        gunzip 'file.gz', 'file.out', Strict=>1 or die $GunzipError;
Re: File integrity checker
by Anonymous Monk on Aug 24, 2017 at 18:43 UTC
    A corrupt file will throw an error and produce a non-zero return code on any attempt to decompress it, so you may as well just try to decompress the thing into a temporary location. If the return code is zero then the archive was intact and you can move the files to their permanent home. If not, discard the files that might have been extracted before the corruption was detected. There is no point in decompressing the file twice, which is effectively what would happen if you tried to test it first. It will be tested anyway.
      Is there a method I can use to create a corrupted zip file, where just one member extractToFileNamed doesn't return AZ_OK?
        You can shoot out one of the files in a .zip with something like this:
        open my $ZIP, '+<', ''; local $/ = undef; my $data = <$ZIP>; my @offset; while ($data =~ /PK\x03\x04/g) { # find all file headers push @offset, pos($data) - 4; } print "Found headers at: @offset\n"; substr($data, $offset[1]+14, 1) ^= "\x01"; # change the crc seek $ZIP, 0, 0; print $ZIP $data;
        (This isn't foolproof, but probably good enough.) The structure of a zip file is described here: Zip (file format)#File headers

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1197939]
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2023-09-27 09:00 GMT
Find Nodes?
    Voting Booth?

    No recent polls found