Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Reading partial/corrupt zip files

by abcde (Scribe)
on Jan 01, 2006 at 12:11 UTC ( [id://520230]=note: print w/replies, xml ) Need Help??


in reply to Reading partial/corrupt zip files

I am not sure of any library that can read corrupted files, but it might be possible to turn an incomplete file into a complete one, adding a phony footer and removing the last file.

Look at the ZIP file format:
http://www.pkware.com/business_and_developers/developer/appnote/
The files in the zip are not connected to each other, so it is possible to read through the file, parsing each file as it comes and bailing out if the file ends unexpectedly:
#!/usr/bin/perl open( Z, "test.zip" ); sub error { print "The file is corrupt.\n"; exit; } sub readstr { my $received; error() if eof Z; $received .= getc(Z) . " " for ( 1 .. $_[0] ); return $received; } sub readint { my $received = 0; error() if eof Z; $received = $received * 255 + ord( getc(Z) ) for ( 1 .. $_[0] ); return $received; } while ( !eof Z ) { my $head = readstr(4); # PK^C^D my $versions = readstr(4); # ... my $filenamelength = readint(2); # ... # Parse the rest of this file # Until we get an error or go on to the next file header } close(Z);

My code doesn't actually produce a correct footer for the file, but it should start you off.

Replies are listed 'Best First'.
Re^2: Reading partial/corrupt zip files
by steves (Curate) on Jan 01, 2006 at 15:10 UTC

    You've hit on the key -- that the files are not connected. Looking at the code, Archive::Zip appears to always first access the central directory information, which is at the end of the file. For files that are not fully sent, that never works since it's the last part of the file that's missing. It makes sense to build the code around the central directory -- it's surely faster than parsing the entire zip file to get the pieces that are available. So I think a recovery method would have to try and piece things together the slow way as you state.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://520230]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-03-28 18:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found