bradcathey has asked for the wisdom of the Perl Monks concerning the following question:

I write web apps that usually require/allow the uploading of files (pdf, jpg, doc, cvs). I have lately been using the following (just a snippet for comparison reasons):

$filename = $sourcepath = $query->upload($file); $filename =~ /([\w. -]+)$/i; $filename = $1; $filename =~ s/ /_/g; my $size = $ENV{CONTENT_LENGTH}; if ($size > $max) { return ($filename,"File is too large to upload.",$size); } open(OUTPUT, ">$path/$filename") or return ($filename, "Cannot open '$ +filename'. Contact Webmaster"); ; binmode($filename); binmode(OUTPUT); while( read($sourcepath, $buffer, 64*2**10) ) { print OUTPUT $buffer; } close(OUTPUT);

However, I just saw this in a recent unrelated node.

my $upload_filehandle = $query->upload("filename"); open UPLOADFILE, ">$upload_dir/$filename"; binmode UPLOADFILE; while ( <$upload_filehandle> ) { print UPLOADFILE; } close UPLOADFILE;

What are the advantages and disadvantages of both? The last one sure looks cleaner--do I need all the stuff in the former? General comments? Thanks.


—Brad
"The important work of moving the world forward does not wait to be done by perfect men." George Eliot

Replies are listed 'Best First'.
Re: File uploading methods compared
by mlh2003 (Scribe) on May 14, 2005 at 13:49 UTC
    It is more secure to use CGI.pm's upload() method to create a filehandle and read from that filehandle than to create a filehandle from user-supplied data in a file upload field. Therefore, it is recommended to use the latter code snippet.

    The link that explains this in more detail is here.

    ADDITION:
    Also, you can set the $CGI::POST_MAX and $CGI::DISABLE_UPLOADS to specify the maximum bytesize of a file that can be uploaded and to enable/disable file uploads, respectively. The link above contains information on those variables as well.
    _______
    Code is untested unless explicitly stated
    mlh2003
Re: File uploading methods compared
by eibwen (Friar) on May 14, 2005 at 14:01 UTC

    $filename = $sourcepath = $query->upload($file); $filename =~ /([\w. -]+)$/i; $filename = $1; $filename =~ s/ /_/g;

    This block appears to be an attempt to return the File::Basename of the upload. Suffice it to say, you should probably leave this to CPAN.

    open(OUTPUT, ">$path/$filename") or return ($filename, "Cannot open '$filename'. Contact Webmaster");

    This is atypical -- at least as far as my perl experience is concerned. I presume you're implementing your error handling outside of this sub, but I don't see why the abstraction is necessary...

    binmode($filename); binmode(OUTPUT);

    OUTPUT is a filehandle, whereas $filename is a scalar variable (and previously regexed at that). The binmode section of perlfunc states that the first argument must be a filehandle. This improper usage appears inconsequential as I presume you no longer use the scalar; nevertheless, only binmode filehandles. I belive you meant to binmode the input and output filehandles, but you've only opened OUTPUT

    while( read($sourcepath, $buffer, 64*2**10) ) { print OUTPUT $buffer; }

    While this may work, you presume that 64k will always be available and you don't appear to check the return and die or warn as appropriate.

    The second example is considerably more "perlish", however it should be noted that it doesn't appear CGI-safe either.

    my $upload_filehandle = $query->upload("filename"); open UPLOADFILE, ">$upload_dir/$filename";

    There is no taint checking, no basename extraction. This is a potential vulnerability.

    I could digress further, but suffice it to say: "if it works, it works" (TIMTOWDI); however please read about taint checking and check the perldoc for the functions you use.

      I certainly agree that you should do taint-checking on the name of the file the users sends you... A simple open UPLOADFILE, ">$upload_dir/$file" could corrupt data depending on the application, and the configuration of the webserver. (even worse would be chdir $upload_dir;open UPLOADFILE, ">$file";

      In my opinion you should ALWAYS use the three arguments version of open (something like: open UF, ">", "$upload_dir/$filename" or die $!). , since this does not allow a mode change... (and a mode change could corrupt one of your files and/or your script's configuration file (wihtout knowing the file's name that is, but that depends on the script), and/or running files (but only when there is no mode, in case this this is impossible since you have ">"))

      As a side note: the OP's code does not need 64kb of data to work, 'read' will try to read a maximum of 64kb of data. (which is quite different)

Re: File uploading methods compared
by zentara (Cardinal) on May 15, 2005 at 11:38 UTC
    I just want to mention mhl2003's comment about

    you can set the $CGI::POST_MAX and $CGI::DISABLE_UPLOADS to specify the maximum bytesize of a file that can be uploaded and to enable/disable file uploads, respectively. The link above contains information on those variables as well.

    I don't do cgi much, but the last time I messed with it, there was a problem with using $CGI::POST_MAX , in that it will actully start uploading the file and continue until the MAX is exceeded. This can be a problem with big files, since it could be used as a "Denial-of-service" attack, where someone could purposely upload large oversized files, and bog down your server. The $ENV{CONTENT_LENGTH} is sent right away, so the cgi script, can cancel the upload immediately.

    This may have been fixed, but I would stick with $ENV{CONTENT_LENGTH}.


    I'm not really a human, but I play one on earth. flash japh
      I don't do cgi much, but the last time I messed with it, there was a problem with using $CGI::POST_MAX , in that it will actully start uploading the file and continue until the MAX is exceeded.
      That because of the nature of HTTP. If you want the client to receive a HTTP response, you have to wait for his HTTP request, and by the time perl starts reading from STDIN, the webserver has probably received all the data the client had to send.

      Some clients don't specify CONTENT_LENGTH.