http://qs1969.pair.com?node_id=520230


in reply to Reading partial/corrupt zip files

I am not sure of any library that can read corrupted files, but it might be possible to turn an incomplete file into a complete one, adding a phony footer and removing the last file.

Look at the ZIP file format:
http://www.pkware.com/business_and_developers/developer/appnote/
The files in the zip are not connected to each other, so it is possible to read through the file, parsing each file as it comes and bailing out if the file ends unexpectedly:
#!/usr/bin/perl open( Z, "test.zip" ); sub error { print "The file is corrupt.\n"; exit; } sub readstr { my $received; error() if eof Z; $received .= getc(Z) . " " for ( 1 .. $_[0] ); return $received; } sub readint { my $received = 0; error() if eof Z; $received = $received * 255 + ord( getc(Z) ) for ( 1 .. $_[0] ); return $received; } while ( !eof Z ) { my $head = readstr(4); # PK^C^D my $versions = readstr(4); # ... my $filenamelength = readint(2); # ... # Parse the rest of this file # Until we get an error or go on to the next file header } close(Z);

My code doesn't actually produce a correct footer for the file, but it should start you off.

Replies are listed 'Best First'.
Re^2: Reading partial/corrupt zip files
by steves (Curate) on Jan 01, 2006 at 15:10 UTC

    You've hit on the key -- that the files are not connected. Looking at the code, Archive::Zip appears to always first access the central directory information, which is at the end of the file. For files that are not fully sent, that never works since it's the last part of the file that's missing. It makes sense to build the code around the central directory -- it's surely faster than parsing the entire zip file to get the pieces that are available. So I think a recovery method would have to try and piece things together the slow way as you state.