hotsolutions has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
POSTing UTF-8 data returns "Internal Server Error: Wide Character in syswrite". Example:
use WWW::Mechanize; use encoding 'utf-8'; binmode STDOUT, ':utf8'; .... $mech->submit_form(form_name => 'form');
I get the "Wide Character" error if $mech->content contains UTF-8 data ... It works if the data is ASCII. What am I missing?
Thanks!


Update:
I tried installing libwww-perl-5.8.26 - got a new error "HTTP::Message content must be bytes"
Solution was to edit HTTP/Request/Common.pm, adding utf8 lines
$k =~ s/([\\\"])/\\$1/g; # escape quotes and backslashes if (utf8::is_utf8($v)){ utf8::encode($v); } push(@parts, qq(Content-Disposition: form-data; name="$k"$CRLF$CRLF$v));<< +/

And removing the "use encoding 'utf-8';" line from the sample script.
Working code:
use WWW::Mechanize; binmode STDOUT, ':utf8'; # Removes 'wide character' warnings $url = "http://localhost/test.cgi"; $mech = WWW::Mechanize->new(); $mech->get($url); if ($mech->success) { if ($mech->form_name( 'frmLookup' )) { $mech->submit_form(form_name => 'frmLookup'); print "Success"; } }
Example form:
print qq{Status: 200 Content-Type: text/html <html> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> </head> <body> <FORM NAME="frmLookup" ACTION="test.cgi" METHOD="Post" ENCTYPE="mult +ipart/form-data"> <INPUT TYPE="text" NAME="test" VALUE="刘"> </FORM> </body> </html>};

Note: this works for multipart/form-data encoding. A similar edit is required for x-www-form-urlencoded encoding.

Replies are listed 'Best First'.
Re: Mechanize/LWP Error "Wide Character in syswrite"
by Anonymous Monk on May 08, 2009 at 08:11 UTC
    What am I missing?
    Try installing libwww-perl-5.826
Re: Mechanize/LWP Error "Wide Character in syswrite"
by ikegami (Patriarch) on May 08, 2009 at 16:13 UTC

    got a new error "HTTP::Message content must be bytes"

    Indeed it does. Sockets can only transmit bytes. As such, the payload of an HTTP message (request or response) can only contain bytes. It is your responsibility to serialize anything that isn't bytes, including text characters. You can use Encode for that.

    if (utf8::is_utf8($v)){ utf8::encode($v); }

    That makes no sense. You can't reliably determine whether the content of a string is encoded or decoded based on the internal encoding. See Re: Decoding, Encoding string, how to? (internal encoding). That line should simply be

    utf8::encode($v);

    If there's a problem with that, there's a problem elsewhere.

    binmode STDOUT, ':utf8';  # Removes 'wide character' warnings

    The following would be better

    use open ':std', ':locale';