atey1 has asked for the wisdom of the Perl Monks concerning the following question:

Hello. I have a perl cgi script that I use to process a zip file, and the create a new zip output for download. I want to do this without writing the output zip file to the disk The zip output is actually raw binary data, not a file.

When I print the binary zip data with application/zip content headers, it works fine, the zip downloads, no problem. However, I'd like to refresh the current page and then output the zip data. In order to do this, I pass the binary data as params and call on the script again to output the data. When I do this, the zip file gets corrupted, I think because the parameters are getting read as text instead of binary data.

My question is, is there a way to pass raw binary data as form parameters and have a script read them as raw binary data?

In case you're wondering, Yes, I could save it to the disk and solve my problems, but I want to avoid that. if possible,

below is the form that I use to pass the seperate headers and the binary data.

print $cgi->start_multipart_form(-action=>'site.cgi', -method=>"POST", + -id=>"hiddenform"); print $cgi->hidden('theoutput',"$_[0]"); //the binary data print $cgi->hidden('theheader',"$_[1]"); //the header data print $cgi->end_multipart_form;

as a test, I can write the $_[0] to a file and it works fine:

open FILE, ">", "uploaded/anything.zip" or die $!; print FILE $_[0]; close FILE

and anyting.zip is a working zipfile.

but, when I obtain the parameters and try the same thing, the zipfile gets corrupted.

my $content = $cgi->param('theoutput'); open FILE, ">", "uploaded/anything.zip" or die $!; print FILE $content; close FILE

EDIT:::: Perhaps I should clarify what I actually want to do and maybe there's a better solution out there

I have a script that allows a user to upload a zip file through a web form. (we'll call it the Upload Page). It then processes that zip file using an external java program which outputs either raw html or raw binary data. While the java program processes the upload, I have a div overlayed on the Upload Page that becomes visibile to show that the request is being processed.

once the java process is done, the script prints the output as a new page. With the html output, this works great, it will print as a new page which replaces the old UploadPage. The problem is with the zip file, the original Upload page is still visible, and the "loading" div is still visible.

What I'd like to do in the case of the zip output is make the original UploadPage visible as it was before

so far, these are the options I've tried

1. Removing the "loading" div once the java process completes. This doesn't work because I believe once the form is submitted, I no longe rhave access to the elements of the original page.

2. Outputting the html of the original upload page as a new page, and then outputting the zipbinary data. This doesn't work because it will just show raw binary data printed at the bottom of the page

3. outputting th ehtml of the original upload page as a new page, and then putting the zip binary data as the source of a hidden iframe. this doesn't work because I couldn't figure out how to have the iframe source be non-html, binary zip data

4. outputting the original upload page as a new page, and then reloading the page using the binary zip data passed parameters to print as the page reload (That was the topic of the original post, which isn't working as I've described already)

5. I thought maybe CGI::Ajax might be the way to go, but I've looked into it and not sure how I can utilize it to my advantage

The other option I thought of is some way of calling some sort of -onsubmitcomplete=> option for my form, meaning once the script is done processing, the upload form submission, I take down the div, or reload the page, or whatever. But I don't think that exists.

I really have no idea what else to try. I really thought I had it with the passing of the binary data as a parameter,

Replies are listed 'Best First'.
Re: Cgi params as binary data
by davido (Cardinal) on Jun 04, 2011 at 17:03 UTC

    CGI.pm to the rescue. Files uploaded via the CGI upload form element create a temporary file on your system already (this is a CGI.pm behavior). The file is automatically unlinked when the script cleans up after itself (behind the scenes). The canonical way to handle uploads is to immediately save them to your own explicit file. But in your case, you're looking for a way to avoid saving the file explicitly. Fortunately, CGI.pm allows you to access the temp file directly. See the following, clipped from the CGI.pm documentation:

    When processing an uploaded file, CGI.pm creates a temporary file on your hard disk and passes you a file handle to that file. After you are finished with the file handle, CGI.pm unlinks (deletes) the temporary file. If you need to you can access the temporary file directly. You can access the temp file for a file upload by passing the file name to the tmpFileName() method:

    $filename = $query->param('uploaded_file'); $tmpfilename = $query->tmpFileName($filename);

    The temporary file will be deleted automatically when your program exits unless you manually rename it. On some operating systems (such as Windows NT), you will need to close the temporary file's filehandle before your program exits. Otherwise the attempt to delete the temporary file will fail.

    So what you could do is allow a file upload, and then process the file in-script without ever saving it explicitly to disk. Be sure to limit the upload file size to something reasonable by setting $CGI::POST_MAX to a value that is large enough for your needs, and small enough to prevent Denial of Service Attacks. Also please remember that any script running via a webserver is potentially running many instances at the same time (Apache forks off multiple processes to accommodate many clients simultaneously). So any memory usage your script creates could potentially be multiplied by the number of site users hitting your script at the same time. The result could be that a script which is already a memory hog, when multiplied by a few instances, could create a real problem on your server. That's one reason why it's common practice to keep large data in files or in a database rather than dealing with them entirely in memory.


    Dave

      Okay, the problem is, I don't actually have a param('uploaded_file'). There's no file-NAME associated with the binary data I want to output.

      Once I process the uploaded zipfile, the processed output is raw binary data that gets generated (What's really going on, is that I feed the uploaded file to an external java program, then that java program spits out raw binary data, that raw binary data is a zip file)

      Maybe I can do something like upload($binary_contents) or something like that?

      Regarding the DOS service attacks, it's a script that would be used by only a handful of people internally, so that's not really something I'm concerned about. We should be okay with a slow running, memory hogging script. :-)

        With all the support out there for storing session data (including larger files), why the strong aversion to doing it the traditional way? If you need to maintain data across page loads, you're maintaining a session's state. CGI::Session helps with that. And it's just one of many. Also Apache::Session, and many others. Dealing with session data becomes simple if you use the right tool.


        Dave

Re: Cgi params as binary data
by moritz (Cardinal) on Jun 04, 2011 at 15:07 UTC
    Do you really want to pass around a whole zip file as a parameter string? Sounds like a very inefficient method to me.

    Anyway, the problem is likely that $cgi->hidden doesn't make any effort to escape any non-printable characters.

      i don't necessarily want to, but I don't want to explicitly save to a file on disk. Any alternatives? What about something like cgi->upload() using binary data instead of a filename. Does that exist?
Re: Cgi params as binary data
by 7stud (Deacon) on Jun 04, 2011 at 17:46 UTC
    You can always base64 encode the data--but that inflates the size by 1/3.
Re: Cgi params as binary data
by Khen1950fx (Canon) on Jun 05, 2011 at 05:31 UTC
    Your question is a little unclear to me. When something isn't clear to me, I start debugging and debug until it is clear. What I tried here was to pass the params to the filehandle. It'll save the data without putting on the disk. You need to binmode the filehandle in order for it to work; also, you need to call the new method twice. Carp and CGI::Carp::DebugScreen are required.
    !/usr/bin/perl use warnings; use Carp; use CGI qw(param); use CGI::Carp::DebugScreen; open( OUT, '>>/root/Desktop/zip.out' ); CGI::Carp::DebugScreen->debug(1); my $q = CGI->new; binmode OUT, ':raw'; my $name = $q->param('content_of_zip') || "Content"; my $out = $q->param('output') || "Binary"; $q->save(OUT); close OUT; # now double-check to see if it was saved. open( IN, '/root/Desktop/zip.out' ); while ( not eof IN ) { $q = CGI->new; print $name, "\n", $out, "\n"; exit 0; } close IN;
      also, you need to call the new method twice.

      you don't need to do it in a loop

        alright, I'm being unclear. I'll try to be simple

        1. user uploads zip in a form

        2. on form submit (action=site.cgi) zip is processed using some external program

        3. the output of the external program can be either a zip or HTML (raw output, not a file)

        4. if I just do "print $my_program_output", the html output will reload/replace the page, but the zip will just present a zip file for download without reloading the page behind it (this works fine, without having to write the output to a file)

        here's my problem: In the case of the zip file, I want to refresh the original page (or have access to its content with javascript), but I don't know how to do that. And I want to avoid having to open a new file and write to it.

Re: Cgi params as binary data
by 7stud (Deacon) on Jun 05, 2011 at 03:37 UTC

    That's as clear as mud. Why don't you do this: give us 3 bytes of the binary data, and then type out exactly how you want that displayed on your webpage. Do not explain anything. Do not clarify. Do not pontificate. All your post should contain is:

    1) This is 3 bytes of my binary data.

    2) This is how I want it to look on my web page.