in reply to Re: WWW::Mechanize always use utf8
in thread WWW::Mechanize always use utf8

Yes, here is it, this problem appear in a linux 64 and don't appear in windows server, i think becouse windows use iso-8859-1 as charset.

#/usr/bin/perl -wT use strict; use CGI qw(param); binmode STDOUT, ':encoding(iso-8859-1)'; print "Content-type: text/html; charset=iso-8859-1\n\n"; my $q = param('status') || 'nothing'; print <<HTML; <html> <body> <h1>You has selected $q</h1> <form name="form" action="http://localhost/"> <select name="status"> <option value="ACCIÓN">ACCIÓN</option> <option value="PINGÜINO">PINGÜINO</option> </select> <input type="submit" name="send" value="send"> </form> </body> </html> HTML
#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; binmode STDOUT, ':encoding(iso-8859-1)'; my $mech = WWW::Mechanize->new; $mech->agent_alias( 'Windows IE 6' ); $mech->add_header('Accept-Charset' => 'iso-8859-1'); $mech->get('http://localhost/'); $mech->submit(); my ($status) = $mech->find_all_inputs( name => 'status', type => 'opti +on'); if (ref $status && ref $status->{menu} eq 'ARRAY') { for my $option (@{$status->{menu}}) { $mech->form_name( 'form' ); $mech->select( 'status', $option->{value}); # Here's the probl +em is send some utf8 string instead iso-8859-1 sleep(2); $mech->click('send'); if ($mech->success()) { print $mech->content(); # the response is in utf8 too } $mech->back(); } }

Replies are listed 'Best First'.
Re^3: WWW::Mechanize always use utf8
by ikegami (Patriarch) on Mar 22, 2009 at 06:58 UTC

    I can replicate the problem with WWW::Mechanize 1.54, but not with WWW::Mechanize 1.34. (LWP 5.825 in both cases.)

    My goal was to rule out a badly encoded file. I used the following equivalent cgi script for testing:

    #!/usr/bin/perl -wT use strict; use CGI qw(param); binmode STDOUT, ':encoding(iso-8859-1)'; print "Content-type: text/html; charset=iso-8859-1\n\n"; my $q = param('status') || 'nothing'; print <<HTML; <html> <body> <h1>You has selected $q</h1> <form name="form" action="http://localhost/"> <select name="status"> <option value="ACCI\x{D3}N">ACCI\x{D3}N</option> <option value="PING\x{DC}INO">PING\x{DC}INO</option> </select> <input type="submit" name="send" value="send"> </form> </body> </html> HTML

    The relevant difference between your systems is not the OS, it's the version of the module.

      Yes ikegami, downgrading the version of the module solve the problem exactly as you said, do you think that it's a bug, to report?

      Thank you a lot

        I'm pretty sure it's a bug (but I'm not sure). It's definitely inconsistent with the behaviour of browsers.
Re^3: WWW::Mechanize always use utf8
by dolmen (Beadle) on Mar 23, 2009 at 12:34 UTC
    First, the CGI code is not portable as you do not specify the encoding of the source code. Either use pure ASCII () or add an encoding statement ("use utf8;" if your source code is encoded in UTF-8).

    Secondly, URL encoding is a historical problem. Originally URL were defined as ASCII only. But some people started to encode non ASCII (8 bits) characters. Some using iso-8859-1. Some with UTF-8. Some with other encodings.
    Then the IETF normalized the URL encoding for HTTP as UTF-8.
    For backward compatibilty, the User-Agent are using the encoding of the document of the form source to decide which encoding to use in GET URLs. You can change this behavior in MSIE in the advanced settings.

    So WWW::Mechanize is working as expected. Change your CGI output to UTF-8 and WWW::Mechanize will probably send URL encoded as UTF-8.

      First, the CGI code is not portable as you do not specify the encoding of the source code.

      How is it not portable?

      The script will be portable no matter what encoding he specifies as long as the encoding in the following two lines match:

      binmode STDOUT, ':encoding(iso-8859-1)'; ... print "Content-type: text/html; charset=iso-8859-1\n\n";

      The only question is whether the browser can encode using iso-8859-1 or not. I'd be very surprise to meet one that couldn't.

Re^3: WWW::Mechanize always use utf8
by Anonymous Monk on Mar 22, 2009 at 08:08 UTC
    How can you tell from that code?
      I checked the server's access log.
        I would print CGI->query_string; :)

      Yes, downgrading the version as ikegami said, solve the problem