isync has asked for the wisdom of the Perl Monks concerning the following question:

Hi there,

I am uploading files to a webserver with
<form action='/Post' method='post' name='FileUpload' enctype='multipar +t/form-data'> <input type='file' name='filename'>
through CGI.pm on a CGI::App, catching the result on the server with
if($ENV{'REQUEST_METHOD'} eq "POST"){ my $form = {}; $form->{filename} = $q->param('filename'); my $tempfile = $q->tmpFileName($form->{filename}); return Dumper($form,$tempfile); ...
and the Dumper above gives me some insight into the problem on files which contain quotes in their name. For example, when I upload a file like Some file with "quotes".txt I get this Dump back from the server:
$VAR1 = {
          'filename' => bless( \*{'Fh::fh00001Some file with \\'}, 'Fh' ),
        };
$VAR2 = \'/var/tmp/CGItemp5543'
The tmp file ends up on the server okay, but the incomplete filename screws up my script. So, anything that springs to mind, some stupid escaping I am not caring about?
Underlying filesystem is Reiser, which allows quotes, when escaped with \ on creation.

Update:
Note: adapted the title, as the FH, although it is truncated name-wise, seems to work.

I am uploading from Linux+Firefox to a Debian server, btw.
After reading the CGI docs on cpan closely again:
"If you want the entered file name for the file, you can just call param():

  $filename = $q->param('field_name');
No, it obviously doesn't.
Different browsers will return slightly different things for the name. Some browsers return the filename only. Others return the full path to the file, using the path conventions of the user's machine. Regardless, the name returned is always the name of the file on the user's machine, and is unrelated to the name of the temporary file that CGI.pm creates during upload spooling (see below).

When a file is uploaded the browser usually sends along some information along with it in the format of headers. The information usually includes the MIME content type. To retrieve this information, call uploadInfo(). It returns a reference to a hash containing all the document headers."
The latter, I then implemented in my script:
my $upload_meta = $q->uploadInfo($form->{filename});
Dumping that returns:
$VAR1 = { 'filename' => bless( \*{'Fh::fh00001Some file with \\'}, 'Fh' ) };
$VAR2 = { 'Content-Type' => 'text/plain', 'Content-Disposition' => 'form-data; name="file"; filename="Some file with \\"quotes\\".txt"' };
There it is, the full filename in all its glory. But: Am I really supposed to parse the Content-Disposition header? I am not even sure if all browser/OS combinations reliably set this header...

Replies are listed 'Best First'.
Re: Unable to get filename with quotes after upload?
by tobyink (Canon) on Jan 16, 2012 at 14:19 UTC

    As an anonymous monk commented, you should be able to do:

    $form->{filename}->asString

    But... emphasis on the word should. It doesn't work with your example. Firefox is correctly escaping the filename, but CGI.pm is not correctly dealing with the escaped quote.

    On the version of CGI.pm on my system (3.49) the broken filename parsing code is on line 3577. It should be relatively easy to fix.

    my ($filename) = $header{'Content-Disposition'} =~/ filename=(("[^"]*")|([a-z\d!\#'\*\+,\.^_\`\{\}\|\~]*))/i;

    Maybe something like...

    my ($filename) = $header{'Content-Disposition'} =~/ filename=(("(?:\\\"|[^"])*")|([a-z\d!\#'\*\+,\.^_\`\{\}\|\~]*) +)/i;
      Works,
      as I didn't want to patch my local CGI.pm, I've put this workaround into my code:
      my $upload_meta = $q->uploadInfo($form->{filename}); my ($filename) = $upload_meta->{'Content-Disposition'} =~/ filename=(( +"(?:\\\"|[^"])*")|([a-z\d!\#'\*\+,\.^_\`\{\}\|\~]*))/i; substr($filename,0,1,''); chop($filename); $filename =~ s/\\"/"/g;
Re: Invalid filehandle on filenames with quotes after upload?
by Anonymous Monk on Jan 16, 2012 at 12:59 UTC
      What exactly is the bit you are referring to? Is it WashName()? Because this also relies on my $filename = $cgi->param( $field ); to get the original albeit possibly dangerous/with-strange-chars filename. So when I would enforce a [A-Z0-9] file-naming convention, it would also fail on getting the original non-canonical name, no?

        ... file-naming convention, it would also fail on getting the original non-canonical name, no?

        What?

Re: Unable to get filename with quotes after upload?
by Anonymous Monk on Jan 16, 2012 at 13:39 UTC

    There it is, the full filename in all its glory. But: Am I really supposed to parse the Content-Disposition header? I am not even sure if all browser/OS combinations reliably set this header...

    Um, absolutely not :)  param('upload_field') returns an object, use it as a filehandle its a filehandle, use it as a string is the filename from content-disposition header

    You can also use  ->asString or ''. param('upload_field')

      I see the problem now

      It is a bug in CGI

      # See RFC 1867, 2183, 2045 # NB: File content will be loaded into memory should # content-disposition parsing fail. my ($filename) = $header{'Content-Disposition'} =~/ filename=(("[^"]*")|([a-z\d!\#'\*\+,\.^_\`\{\}\ +|\~]*))/i; $filename ||= ''; # quench uninit variable warning $filename =~ s/^"([^"]*)"$/$1/;

      It is a bug in CGI::Simple

      my ( $param ) = $unfold =~ m/form-data;\s+name="?([^\";]*)"?/; my ( $filename ) = $unfold =~ m/name="?\Q$param\E"?;\s+filename="?([^\"]*)"?/;

      Example quoted string request

      my $VAR1 = "POST http://localhost/cgi-bin/upload_quotes.pl\nContent-Le +ngth: 138\nContent-Type:". " multipart/form-data; boundary=xYzZY\n\n--xYzZY\r\nContent-Di +sposition: form". "-data; name=\"file\"; filename=\"stupid \\\"quoted\\\" filena +me.txt\"\r\n-Content:". " stupid content\r\n\r\n\r\n--xYzZY--\r\n";

      I'd have suggested a regex, I've seen one many times, but I'm not up on the rfcs

      http://tools.ietf.org/html/rfc1521

      value := token / quoted-string token := 1*<any (ASCII) CHAR except SPACE, CTLs, or tspecials> tspecials := "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / <"> / "/" / "[" / "]" / "?" / "=" ; Must be in quoted-string, ; to use within parameter values

      http://tools.ietf.org/html/rfc2822#section-3.2.5

      FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space obs-FWS ctext = NO-WS-CTL / ; Non white space controls %d33-39 / ; The rest of the US-ASCII %d42-91 / ; characters not including "( +", %d93-126 ; ")", or "\" ccontent = ctext / quoted-pair / comment comment = "(" *([FWS] ccontent) [FWS] ")" CFWS = *([FWS] comment) (([FWS] comment) / FWS) qtext = NO-WS-CTL / ; Non white space controls %d33 / ; The rest of the US-ASCII %d35-91 / ; characters not including "\ +" %d93-126 ; or the quote character qcontent = qtext / quoted-pair quoted-string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE