in reply to WWW::Mechanize always use utf8

Seems unlikely, show some code please

Replies are listed 'Best First'.
Re^2: WWW::Mechanize always use utf8
by Anonymous Monk on Mar 21, 2009 at 01:08 UTC

    Yes, here is it, this problem appear in a linux 64 and don't appear in windows server, i think becouse windows use iso-8859-1 as charset.

    #/usr/bin/perl -wT use strict; use CGI qw(param); binmode STDOUT, ':encoding(iso-8859-1)'; print "Content-type: text/html; charset=iso-8859-1\n\n"; my $q = param('status') || 'nothing'; print <<HTML; <html> <body> <h1>You has selected $q</h1> <form name="form" action="http://localhost/"> <select name="status"> <option value="ACCIÓN">ACCIÓN</option> <option value="PINGÜINO">PINGÜINO</option> </select> <input type="submit" name="send" value="send"> </form> </body> </html> HTML
    #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; binmode STDOUT, ':encoding(iso-8859-1)'; my $mech = WWW::Mechanize->new; $mech->agent_alias( 'Windows IE 6' ); $mech->add_header('Accept-Charset' => 'iso-8859-1'); $mech->get('http://localhost/'); $mech->submit(); my ($status) = $mech->find_all_inputs( name => 'status', type => 'opti +on'); if (ref $status && ref $status->{menu} eq 'ARRAY') { for my $option (@{$status->{menu}}) { $mech->form_name( 'form' ); $mech->select( 'status', $option->{value}); # Here's the probl +em is send some utf8 string instead iso-8859-1 sleep(2); $mech->click('send'); if ($mech->success()) { print $mech->content(); # the response is in utf8 too } $mech->back(); } }

      I can replicate the problem with WWW::Mechanize 1.54, but not with WWW::Mechanize 1.34. (LWP 5.825 in both cases.)

      My goal was to rule out a badly encoded file. I used the following equivalent cgi script for testing:

      #!/usr/bin/perl -wT use strict; use CGI qw(param); binmode STDOUT, ':encoding(iso-8859-1)'; print "Content-type: text/html; charset=iso-8859-1\n\n"; my $q = param('status') || 'nothing'; print <<HTML; <html> <body> <h1>You has selected $q</h1> <form name="form" action="http://localhost/"> <select name="status"> <option value="ACCI\x{D3}N">ACCI\x{D3}N</option> <option value="PING\x{DC}INO">PING\x{DC}INO</option> </select> <input type="submit" name="send" value="send"> </form> </body> </html> HTML

      The relevant difference between your systems is not the OS, it's the version of the module.

        Yes ikegami, downgrading the version of the module solve the problem exactly as you said, do you think that it's a bug, to report?

        Thank you a lot

      First, the CGI code is not portable as you do not specify the encoding of the source code. Either use pure ASCII () or add an encoding statement ("use utf8;" if your source code is encoded in UTF-8).

      Secondly, URL encoding is a historical problem. Originally URL were defined as ASCII only. But some people started to encode non ASCII (8 bits) characters. Some using iso-8859-1. Some with UTF-8. Some with other encodings.
      Then the IETF normalized the URL encoding for HTTP as UTF-8.
      For backward compatibilty, the User-Agent are using the encoding of the document of the form source to decide which encoding to use in GET URLs. You can change this behavior in MSIE in the advanced settings.

      So WWW::Mechanize is working as expected. Change your CGI output to UTF-8 and WWW::Mechanize will probably send URL encoded as UTF-8.

        First, the CGI code is not portable as you do not specify the encoding of the source code.

        How is it not portable?

        The script will be portable no matter what encoding he specifies as long as the encoding in the following two lines match:

        binmode STDOUT, ':encoding(iso-8859-1)'; ... print "Content-type: text/html; charset=iso-8859-1\n\n";

        The only question is whether the browser can encode using iso-8859-1 or not. I'd be very surprise to meet one that couldn't.

      How can you tell from that code?
        I checked the server's access log.

        Yes, downgrading the version as ikegami said, solve the problem