Zip file extraction (perhaps a little OT)

coldfingertips has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a file uploader and I'm scripting the ability to upload zip files. I'm only parsing files (images, to be percise), all other data including directories are discarded as I'm not going to parse beyond the root of the .zip file.

Question 1) Does the entire .zip file have to be uploaded to the server prior to reading it? Or can I slurp it into memory? Now that I think about it, this sounds like a bad idea but leads to the next question

Question 2) Servers all have timeouts for scripts that run forever and I fear that if someone uploads 100s of MB worth of files in their zip file, my server will cut off. This is where it might be a little OT..

I've seen some forums (though I can't remember which) that have admin panels that send out emails. It sends out a bunch, then it redirects to a page with a timeout button/timer on it, then it reloads itself and starts again with another chunk of emails to send out, reloads with a new timeout button/timer, etc. And when I have over 10,000 emails to send it reloads the page about 10 times or so I think.

How is that done? I'd like to use that same idea if the user is uploading a huge zip file.

Comment on Zip file extraction (perhaps a little OT)

Replies are listed 'Best First'.
Re: Zip file extraction (perhaps a little OT) by Corion (Patriarch) on Sep 05, 2007 at 14:12 UTC
The directory of a `.zip` file comes at the very end of the file, so you will have to wait with processing it until the file has been received completely, at least if you want to use the standard tools for zip files. For your other question, see merlyn's Watching long processes through CGI.	[reply] [d/l]
Re^2: Zip file extraction (perhaps a little OT) by polettix (Vicar) on Sep 05, 2007 at 14:53 UTC
As I understood OP's question, in this case the "long process" is the upload itself, so I don't know if merlyn's column is applicable here. On the other hand, I remember that Apache times out only upon idle conditions, so a very long upload (where data continues to pipe in) should not be a problem. Flavio perl -ple'$_=reverse' <<<ti.xittelop@oivalf Don't fool yourself.	[reply]
Re^3: Zip file extraction (perhaps a little OT) by coldfingertips (Pilgrim) on Sep 05, 2007 at 16:21 UTC
I might be wrong but I don't think that's the case. I once wrote a bot that scraped a page and submitted a dynamic form 10 times per search query for sometimes hundreds of searches/page scrapes from a slow server.. and it often gave up after a few minutes. I also created a link popularity script to parse engines and that occassionally timed out as well. In reality I need to find a way to break up the uploaded zip archive and upload for X minutes (or X MB, or X files), load a page and meta refresh, and then start where I left off.	[reply]