Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, all.

I have a mod_perl script that is handling a multipart/form-data request wherein a user has attached a series of files for uploading. Unfortunately, some of those files can be quite large so when I instantiate the query object, it's taking forever. Based on my debugging I'm guessing that all the request data (including the uploaded file data) has to be pulled in prior to creation of the query object.

Is there any way around this? I have a way of keeping the user notified of upload progress, but it requires that I have some of the parameters being passed in through the form. If I have to wait until all the data is transfered, well, keeping the user posted on progress is pretty useless. Is there any way to do "my $q = new CGI;" without it taking the time to process _all_ the file data, do some pre-processing around the non-upload parameters and _then_ go handle the uploaded files? Alternatively, is there a way to have "new CGI" process in the back-ground?

Thanks a bunch, monks.
  • Comment on Deferred/selective processing in CGI.pm

Replies are listed 'Best First'.
Re: Deferred/selective processing in CGI.pm
by tachyon (Chancellor) on Oct 30, 2004 at 11:44 UTC

    There are all sorts of technical problems with this. First you need to understand that what you think of as some form params and some file arrives at the server as a text stream that looks like:

    # a form <form method="POST" action="http://domain.com/cgi-bin/dump.txt?sending=query&string=as +&well" enctype="multipart/form-data"> <input type="file" name="upload_file1"> <input type="text" name="text1"> <input type="text" name="text2"> <input type="text" name="text3"> <input type="file" name="upload_file2"> <input type="submit" value="Submit" name="submit"> </form> # a very simple script [root@devel3 cgi-bin]# cat dump.txt #!/usr/bin/perl print "Content-Type: text/plain\n\n"; print "QUERY STRING: ", $ENV{QUERY_STRING},"\n\n\n"; print while <> # what actually happens QUERY STRING: sending=query&string=as&well -----------------------------7d43d8850366 Content-Disposition: form-data; name="upload_file1"; filename="C:\Docu +ments and Settings\Administrator\My Documents\My Pictures\120x200-osc +on04.gif" Content-Type: image/gif GIF89a..!..,.....D; -----------------------------7d43d8850366 Content-Disposition: form-data; name="text1" foo -----------------------------7d43d8850366 Content-Disposition: form-data; name="text2" bar -----------------------------7d43d8850366 Content-Disposition: form-data; name="text3" baz -----------------------------7d43d8850366 Content-Disposition: form-data; name="upload_file2"; filename="C:\Docu +ments and Settings\Administrator\My Documents\My Pictures\bckgrnd.gif +" Content-Type: image/gif GIF89a..,.......; -----------------------------7d43d8850366 Content-Disposition: form-data; name="submit" Submit -----------------------------7d43d8850366--

    So there are all sorts of take home points. First if you can arrange for your 'vital' form data to go via the query string you are guaranteed it will be immediately available. All you should need is the SESSID (that is all you need right?) so that should be an option.

    Next you will note that the order of the text stream mirrors that of the form data fields. So you get the params after the first file.

    Finally you need to know that the first thing a CGI/CGI::Simple object does when you call new() is to strip the POST data off STDIN. Data on STDIN is *gone* once it is read so you can't have a sneak peak as it were. The CGI parsers need the whole stream so you really need to use $ENV{QUERY_STRING} to get your vital data and delay the call to new(). If you can do that you can fork of a child to look after the downloading and then it is just an IPC problem.

    Note you can call new like this: my $q = CGI->new($ENV{QUERY_STRING}) and you object will just contain the q string parse, not parse of any data on STDIN.

    cheers

    tachyon

Re: Deferred/selective processing in CGI.pm
by tilly (Archbishop) on Oct 30, 2004 at 17:07 UTC
    You can find the length of the upload by looking at $ENV{CONTENT_LENGTH}. If you call $q->upload_hook or CGI::upload_hook (depending on whether you are using the function interface or OO one) before attempting to read any parameters, then you can arrange for code to be run while it is uploading the files.

    That code can try to send notifications back to the user, and you'll know exactly where you are in the process.

    Note that you'll have to be very careful of buffering. It is very easy to accidentally buffer your output at various points (in your script or in the webserver) which would result in the user not getting any data until you're all done and are ready to send the main page.

      Thanks, all... These are great suggestions and have definitely given me plenty to work with. I've got a fairly workable solution in place courtesy of T's suggestion to use the QUERY_STRING Env variable. Between that and CONTENT_LENGTH, I'm pretty much dialed in at this point. Thanks, monks!!
Re: Deferred/selective processing in CGI.pm
by Your Mother (Archbishop) on Oct 30, 2004 at 05:04 UTC

    A possible strategy for this can be (pseudo-code!):

    $| = 1; # keep info to browser going when it's there # Do some piece of long running stuff; this will # prevent apache from timing out the connection # but might need to be called often. $r->reset_timeout(); # At the same time check that user still cares. $r->connection->aborted() and stop_it_all(); # While in a browser nearby some clever JavaScript is # showing "Processing" "dot" "dot" "dot" and can # be dynamically fed updates by something like $r->print('<div class="status_"' . $count++ . '>+</div>'); # which is going straight to the JS b/c of $|. # Probably something client side is the only way to # give decent status "progress" messaging; the # mod_perl can send approximations of how much is # left to go.

    This is a sticky problem. It might be better served with an email receiving queue to take the files, or by breaking them into single file uploads instead of letting a user upload, whatever, 10 large files at once.

    It seems likely that because of all the up-front work CGI does you might need to, probably should, write the uploads in native mod_perl so you have complete control. I don't usually use CGI.pm with mod_perl so I'm sorry that's just a guess. It's a fantastic module but mod_perl gives you all its facilities (well, not as easily) and much finer control at every level of the request; not just the reading of the already accepted request and sending the response.

    Good luck. (update: added close to div tag, #2: speling problamz)