tospo has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,
When I want to upload a file in a cgi script, I have always used the technique described in all the tutorials I have found so far, i.e. get the filehandle of the the uploaded temp file, traverse it with a while loop and print to a new opened filehandle like this:

my $upload_filehandle = $query->upload("TheFile"); open ( UPLOADFILE, ">$target_filename" ) or die "$!"; binmode UPLOADFILE; while ( <$upload_filehandle> ) { print UPLOADFILE; } close UPLOADFILE;

Now I have recently seen some code where the name of the tempfile that CGI.pm creates is accessed directly, which is also documented in the CGI.pm docs:

$filename = $query->param('uploaded_file'); $tmpfilename = $query->tmpFileName($filename);

So the code I've seen simply moves this file to a new location with a single command (rename or File::Copy) instead of the above loop over its contents and print to new openend filehandle.
To me, moving the file actually seems a) simpler (less code) and b) better for performance than the loop construct but since all tutorials I have found teach the first version I am a bit uneasy about just using it in case there is a good reason that I am missing.
Could somebody please shed some light on this for me? Thanks in advance!

Replies are listed 'Best First'.
Re: CGI file upload: Print to filehandle vs move file
by rowdog (Curate) on Oct 16, 2009 at 22:06 UTC

    Yes, renaming a file is much faster and quite appropriate for things like image files.

    binmode UPLOADFILE; while ( <$upload_filehandle> )
    You have a potential denial of service attack when you slurp the entire uploaded file into RAM, which is why CGI recommends using read instead.
    while ($bytesread = $io_handle->read($buffer,1024)) {

      Hi rowdog,
      Thanks for your reply and for confirming that moving the file makes sense. However, I'm not sure I fully understand the connection to the code and the denial of service attack in your post - reading the file into memory would be exactly what I don't want to do anyway, right? I would simply use the 'rename' command to move the whole file without ever looking at it (at least at the upload stage - later of course I will do some checks as explained in my reply above).

      Thanks.

        Sorry, my post wasn't really so clear. The off-topic point was that setting binmode on a filehandle and then calling readline on it (<$upload_filehandle>) slurps the entire file into RAM, which is bad if you run out of RAM.

        The real point was that, if you don't need to process the file as it comes in, it's much faster to just rename it. I would probably solve this problem by renaming the temp file but I second dhoss's suggestion to look at CGI::Upload before you decide.

Re: CGI file upload: Print to filehandle vs move file
by stonecolddevin (Parson) on Oct 19, 2009 at 19:13 UTC

    Check out CGI::Upload. Typically it's nice to upload it to a temp directory, check out that it's sane/untained, then copy it to a permanent location.

    mtfnpy

      Thanks!
      CGI::Upload seems like a good idea. So I guess in summary it's fair to say that the only reason why the upload procedure is usually taught with the loop over the file's contents is that that gives you the chance to read the file and give some feedback to the user and if this is not required then simply copying/moving the file is acceptable.
      Thanks everybody!
Re: CGI file upload: Print to filehandle vs move file
by Anonymous Monk on Oct 16, 2009 at 16:58 UTC
    Are you sur you want to move the file? Without looping to control its content?

      I don't want to check the contents of the file at the upload stage for two reasons:

      1. These files are quite large and it would take too long to do the kind of checks that I do need to do with them
      2. The uploaded files are processed by a back-end script asynchronously (on a remote cluster actually). At that stage, the files are checked and the user gets feed back if there was something wrong with them.
      Therefore moving the file seems the obvious solution but I was wondering why this is never mentioned in any of the tutorials I've seen so far.