As a spin-off from this thread, I've got a problem with CGI.pm and file uploads from utf8 html pages. On larger uploads (valid binary files like mp3s etc) I get a "CGI.pm: Server closed socket during multipart read (client aborted?)." on the client/browser side and in apache2's error.log multiple lines with "Malformed UTF-8 character (unexpected continuation byte ..."

To replicate the error I use the script from http://www.perlfect.com/articles/upload.shtml

<FORM ENCTYPE="multipart/form-data" ACTION="/cgi-bin/upload.pl" METHOD +="POST"> Please choose directory to upload to:<br> <SELECT NAME="dir" +> <OPTION VALUE="images">images</OPTION> <OPTION VALUE="sounds">sound +s</OPTION> </SELECT> <p> Please select a file to upload: <BR> <INPUT +TYPE="FILE" NAME="file"> <p> <INPUT TYPE="submit"> </FORM>
which is served as "Content-Encoding: utf8;"

Then I upload a file of approx. 800K (on smaller files like 300K I manage to get them up despite of errors...) to this script:
#!/usr/bin/perl -CS use CGI::Carp qw(fatalsToBrowser); use CGI; my $cgi = new CGI; my $file = $cgi->param('file'); $file=~m/^.*(\\|\/)(.*)/; # strip the remote path and keep the filenam +e my $name = $2; open(LOCAL, ">/var/www/mypath/$file") or die "$!: path: /var/www/mypat +h/$name file: $file"; while(<$file>) { print LOCAL $_; } print $cgi->header(); print "$file has been successfully uploaded... thank you.\n";


Now, the important part is the #!/usr/bin/perl -CS switch! It tells perl to perate in utf8 mode on input, output and stderror. A method I also had in effect with different approaches, i.e. with binmode STDIN, STDOUT etc. set to utf8.

I need this functionality as I use CGI::Application to serve pages in utf8. But what it also seems to do is treating file-uploads as utf8 which leads to the error I am describing.
For example, setting the switch to #!/usr/bin/perl -COE (= utf8 for output and stderror) will not yield the error.

1. Now, is this behaviour an error of CGI.pm (not differentiating between form-(text)-data and file-uploads or intended behaviour?

2.And what is the correct application design? Should I use the switch -COE so output and err is utf8 while input stays :raw - which I then convert on selected form-fields via my $param_f = decode("utf8", $q->param("f") )?

In reply to CGI.pm: "Malformed UTF-8 character" in apache's error.log by isync

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.