Talroot has asked for the wisdom of the Perl Monks concerning the following question:

I've got a website setup for employees to upload large files to so that they may be downloaded by external users. The files sent through this site are routinely 2+ GB and I'm looking for a way to streamline the processing.

I'm using swfupload on the client end to make it easy for the users and it's been working happily for many years. Recently however the files are getting even larger and I've noticed that the way CGI is handing the uploads isn't too efficient.

The upload is streamed into a temporary directory on the c: drive. Once the client finishes POSTing it would seem that the file is then copied to it's final location on the d: and then deleted from c:. I tried changing the temp location to d:, but I still have to wait while the OS makes a copy of the 3GB file before deleting the CGITemp one.

Aside from taking double disk space during the process, the copy delay can be long, and the client see's their download stuck at 100% (via swfupload).

So my question is, how do I force CGI to write directly to the final desired handle and location, bypassing the CGITemp process? I looked into the hook feature, but as far as I understand it, that is only a means to monitor the upload process. I tried setting 'use_tempfile' to 0, and was successful as far as I could tell, but that just made the CGITemp file write to c:\windows\temp instead.

Replies are listed 'Best First'.
Re: CGI upload efficiency
by sauoq (Abbot) on May 08, 2012 at 01:22 UTC
    I looked into the hook feature, but as far as I understand it, that is only a means to monitor the upload process.

    No, you don't understand it completely. The upload hook is a callback that you create. It is called repeatedly while your file is uploading. The $buffer argument (see the docs) is how you get at the uploaded data. You could, for instance, write that out to whatever file you choose.

    A couple other things... With the way you are currently doing it, I'm pretty sure CGI.pm should just be renaming the temporary file. If the source and destination drives are the same that should be nearly instantaneous regardless of file size.† Using a different drive will force it to copy the data which is not what you want. Also, It's not documented and therefore not advisable, but you can set $CGITempFile::TMPDIRECTORY to choose a different directory for your temp files.

    Here's a quick sample script so you can see what the upload hook does. Upload a smallish text file to see it work. The $data argument can be anything you want. (You might want to pass a filename or filehandle that you'll use within the hook to write the data to, for example.)

    #!/usr/bin/perl use CGI; my $q = CGI->new(\&hook, 'some arbitrary data', 0); print $q->header(); form_page(); my $header_printed = 0; sub hook { print "Content-type: text/plain\n\n" unless $header_printed; $header_printed = 1; my ($filename, $buffer, $bytes_read, $data) = @_; print "Read $bytes_read bytes of $filename\n"; print "My \$data = $data\n"; print "Buffer length: " . length($buffer) . "\n"; print $buffer . "\n"; } sub form_page { print $q->start_html(); print $q->start_multipart_form(); print $q->filefield( -name=>'uploaded_file', -default=>'x', -size=>50, -maxlength=>80); print $q->submit; print $q->end_form; print $q->end_html; }

    † I don't do Windows. Maybe I'm giving it too much credit for being sane here.

    Update: Added example code.

    -sauoq
    "My two cents aren't worth a dime.";

      Thanks for the example, I'll try it straight away.

      my $q = CGI->new(\&hook, 'some arbitrary data', 0);

      If I read this correctly, the '0' tells CGI not to write data to it's normal file and therefore the bits will only end up going wherever &hooks directs them to?

      Regarding the temp file location, I had trouble changing the location in code, but setting the TMPDIR env variable in the OS worked for what I needed.

      The behavior I saw was that even on the same drive CGI made a copy of the file and then unlinked the CGITemp one. On a 2GB file the delay was quite noticeable, so I used sysinternals procmon to verify that the data was infact copied, even when on the same disk. It could work differently on linux, no doubt. It really wouldn't be an issue either way if the files were a more reasonable size ;)

        If I read this correctly, the '0' tells CGI not to write data to it's normal file and therefore the bits will only end up going wherever &hooks directs them to?

        Yes. If you instantiate CGI.pm this way, it won't write the uploaded data to a temp file. Your only chance to do something with it is as you get it chunk by chunk in your callback routine (the hook function.)

        -sauoq
        "My two cents aren't worth a dime.";

      Also, It's not documented and therefore not advisable, but you can set $CGITempFile::TMPDIRECTORY to choose a different directory for your temp files.

      IIRC it has officially documented for at least a year

Re: CGI upload efficiency
by tinita (Parson) on May 07, 2012 at 22:59 UTC
    look for "upload_hook" in the CGI documentation, this lets you handle the upload without a temp file.

    edit:
    I looked into the hook feature, but as far as I understand it, that is only a means to monitor the upload process. I tried setting 'use_tempfile' to 0, and was successful as far as I could tell, but that just made the CGITemp file write to c:\windows\temp instead.
    Then that's a bug, because the docs say:
    The $use_tempfile field is a flag that lets you turn on and off CGI.pm's use of a temporary disk-based file during file upload. If you set this to a FALSE value (default true) then param('uploaded_file') will no longer work, and the only way to get at the uploaded data is via the hook you provide.