Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi!, I'm using WWW::Mechanize to connect with a page that's encode in iso-8859-1, first of all, I get trough find_all_inputs a select input, with each option of the select, I make other request, the problem is becouse when i do this second request, some values of the select use extended ascii over latin1, but mechanize convert it to utf8 or something like that

to be more clear, instead mechanize send http://localhost/?field=ACCI%D3N it send http://localhost/?field=ACCI%C3%93N

This let me clear that i dont really undestand how encoding works on Perl.

Replies are listed 'Best First'.
Re: WWW::Mechanize always use utf8
by Anonymous Monk on Mar 21, 2009 at 00:22 UTC
    Seems unlikely, show some code please

      Yes, here is it, this problem appear in a linux 64 and don't appear in windows server, i think becouse windows use iso-8859-1 as charset.

      #/usr/bin/perl -wT use strict; use CGI qw(param); binmode STDOUT, ':encoding(iso-8859-1)'; print "Content-type: text/html; charset=iso-8859-1\n\n"; my $q = param('status') || 'nothing'; print <<HTML; <html> <body> <h1>You has selected $q</h1> <form name="form" action="http://localhost/"> <select name="status"> <option value="ACCIÓN">ACCIÓN</option> <option value="PINGÜINO">PINGÜINO</option> </select> <input type="submit" name="send" value="send"> </form> </body> </html> HTML
      #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; binmode STDOUT, ':encoding(iso-8859-1)'; my $mech = WWW::Mechanize->new; $mech->agent_alias( 'Windows IE 6' ); $mech->add_header('Accept-Charset' => 'iso-8859-1'); $mech->get('http://localhost/'); $mech->submit(); my ($status) = $mech->find_all_inputs( name => 'status', type => 'opti +on'); if (ref $status && ref $status->{menu} eq 'ARRAY') { for my $option (@{$status->{menu}}) { $mech->form_name( 'form' ); $mech->select( 'status', $option->{value}); # Here's the probl +em is send some utf8 string instead iso-8859-1 sleep(2); $mech->click('send'); if ($mech->success()) { print $mech->content(); # the response is in utf8 too } $mech->back(); } }

        I can replicate the problem with WWW::Mechanize 1.54, but not with WWW::Mechanize 1.34. (LWP 5.825 in both cases.)

        My goal was to rule out a badly encoded file. I used the following equivalent cgi script for testing:

        #!/usr/bin/perl -wT use strict; use CGI qw(param); binmode STDOUT, ':encoding(iso-8859-1)'; print "Content-type: text/html; charset=iso-8859-1\n\n"; my $q = param('status') || 'nothing'; print <<HTML; <html> <body> <h1>You has selected $q</h1> <form name="form" action="http://localhost/"> <select name="status"> <option value="ACCI\x{D3}N">ACCI\x{D3}N</option> <option value="PING\x{DC}INO">PING\x{DC}INO</option> </select> <input type="submit" name="send" value="send"> </form> </body> </html> HTML

        The relevant difference between your systems is not the OS, it's the version of the module.

        First, the CGI code is not portable as you do not specify the encoding of the source code. Either use pure ASCII () or add an encoding statement ("use utf8;" if your source code is encoded in UTF-8).

        Secondly, URL encoding is a historical problem. Originally URL were defined as ASCII only. But some people started to encode non ASCII (8 bits) characters. Some using iso-8859-1. Some with UTF-8. Some with other encodings.
        Then the IETF normalized the URL encoding for HTTP as UTF-8.
        For backward compatibilty, the User-Agent are using the encoding of the document of the form source to decide which encoding to use in GET URLs. You can change this behavior in MSIE in the advanced settings.

        So WWW::Mechanize is working as expected. Change your CGI output to UTF-8 and WWW::Mechanize will probably send URL encoded as UTF-8.
        How can you tell from that code?