in reply to CGI.pm: "Malformed UTF-8 character" in apache's error.log

This is expected behaviour since POST data is read from STDIN.

You will have to switch STDIN to binary before reading the binary upload data. If everything in CGI.pm works as I expect, you can probably do a binmode STDIN and possibly binmode $file right before reading from $file.

By the way, I've always distrusted the use of the param() as both providing the file name and the handle. If that doesn't work, you may want to use $cgi->upload('file') to get the file handle.

Replies are listed 'Best First'.
Re^2: CGI.pm: "Malformed UTF-8 character" in apache's error.log
by isync (Hermit) on Feb 26, 2008 at 18:54 UTC
    #!/usr/bin/perl -CS use CGI; use CGI::Carp qw(fatalsToBrowser); my $cgi = new CGI; my $filename = $cgi->param('file'); my $fh = $cgi->upload('file'); binmode $fh; open(OUTPUT, ">/var/www/mypath/$filename") or die $!; while(<$fh>) { print OUTPUT $_; } print $cgi->header(); print "$file has been successfully uploaded... thank you.\n";
    does not work (same error) so switching the filehandle/STDIN (I tried both) just before storing the file does not seem to be possible after giving perl the -CS switch.

    update: changing binmode $fh; to binmode $fh, ":utf8"; does the trick! (Is this what you meant and does that mean I am still on the right track or is it a hint of a problem?)
      • Don't use :utf8 on untrusted data. Use :encoding(UTF-8). For that matter, don't use :utf8.

      • You decode bytes to chars, but you never encode the chars back to bytes when writing them. You'll get wide character warnings for non-ASCII chars, and you could get a mix of iso-latin-1 and UTF-8 characters in more complex programs.

      • What awful inconsistency in your file handle names: $fh and OUTPUT? And using a global variable too?

      Fixed:

      ... my $fh_in = $cgi->upload('file'); binmode $fh_in, ':encoding(UTF-8)'; open(my $fh_out, '>:encoding(UTF-8)', "/var/www/mypath/$filename") or die $!; while (<$fh_in>) { print $fh_out $_; }

      But there's absolutely no reason to convert to chars in the above code, so you'd be better off as

      ... my $fh_in = $cgi->upload('file'); open(my $fh_out, '>', "/var/www/mypath/$filename") or die $!; while (<$fh_in>) { print $fh_out $_; }

      You also mentioned binary uploads (like MP3s). For that, you'd use

      ... my $fh_in = $cgi->upload('file'); open(my $fh_out, '>', "/var/www/mypath/$filename") or die $!; binmode $fh_in; binmode $fh_out; local $/ = \4096; # Don't wait to find "\n" while (<$fh_in>) { print $fh_out $_; }

      The code for binary files also works with text files.