perlmonkdr has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, i develop an aplication that use LWP to retrive some info, the page that query LWP is coded in UTF-8.

use LWP::UserAgent; use HTML::Entities; use HTTP::Cookies; use open ':utf8'; use utf8; use locale; use POSIX 'locale_h'; setlocale(LC_ALL, 'en_US.UTF-8'); my $ua = LWP::UserAgent->new; $ua->default_header( 'Accept-Language' => 'en-US', 'Accept-Charset' => 'utf-8', 'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, +image/png, */*' ); #$ua->parse_head(0); my $req = HTTP::Request->new(GET => 'http://www.yahoo.com'); print $ua->request($req)->as_string;

I set and change everything that encounter in google but nothing works, including some changes in LWP::Protocol...

What can i do? before cry...

Replies are listed 'Best First'.
Re: LWP and UTF-8
by moritz (Cardinal) on Oct 29, 2007 at 21:37 UTC
    I just tested it LWP::Simple:
    #!/usr/bin/perl use strict; use warnings; use Encode; use LWP::Simple; my $content = get('http://perl-6.de'); if ($content){ $content = decode('utf-8', $content); # now work with $content here # assuming your terminal is set to utf-8 as well: print encode('utf-8', $content); }

      Thank Moritz for your help, you know that i worked with encode module too but not work, i would like to test with LWP::Simple, unfortunely i need use persistent cookies

      Thank again and perhaps i will need to works in internal modules, step by step

      If anyone know how to solve this problem, please tell me

Re: LWP and UTF-8
by Your Mother (Archbishop) on Oct 29, 2007 at 22:54 UTC

    I think you have a couple of problems (most of the stuff you're showing is unnecessary for utf-8 web stuff and you can let LWP DWIW and return decoded_content from your response objects). The real problem is that Google will not allow LWP requests for queries so whatever real code you're trying to run is not going to fly like that (and I'd rather not explain how to get around it). They do have a developers' kit/token for making queries which you should check out: Google APIs.

      Thk, but i don't retrieve infomation from google instead retrieve information from an intranet that use UTF-8, the chance wanted that google uses iso-8859-1, i'm update to yahoo.com

      You right in decode_content in fact, i didn't put it in the previous code becouse i wanted to show to you that the complete page with headers are wrong, but yes, i'm using

      $res->decoded_content((charset => 'utf-8'));

      Unfortunely doesn't works.

      Please note that, LWP have or had a bug with this problem... or at least i see this in many forums on google, but nothing work for me, becouse of that i'm here with the monks :-).

      Many thank.

Re: LWP and UTF-8
by Gangabass (Vicar) on Oct 30, 2007 at 09:23 UTC

    Can you show response headers from your Intranet site? You can record them with LiveHTTPHeaders extension for Firefox or with HTTP::Recorder.

      Yes, sure, below is an extract, the other 3 .. 5 are images, i know that you want to see, the common error are no send properties content-type encoded header, but it's right, in fact, the IE/Opera/FF show the content in correct way

      +++GET 1+++ GET /api/4512268 HTTP/1.0 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, applicati +on/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, a +pplication/x-shockwave-flash, */* Accept-Language: en-us UA-CPU: x86 User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) Host: localhost Cookie: host=A4C779;JSESSIONID=E91C79C1A1A125A16F6586457AFF5C20 Connection: keep-alive +++RESP 1+++ HTTP/1.0 200 OK Content-Type: text/html;charset=UTF-8 Transfer-Encoding: chunked Date: Tue, 30 Oct 2007 13:15:06 GMT Server: Apache-Coyote/1.1 +++GET 3+++ ..... +++CLOSE 5+++ ..... +++CLOSE 1+++

      Well, thk Gangabass

        OK. Can you print results (with headers) to a file and put this file somethere on the web?