in reply to LWP and UTF-8

Can you show response headers from your Intranet site? You can record them with LiveHTTPHeaders extension for Firefox or with HTTP::Recorder.

Replies are listed 'Best First'.
Re^2: LWP and UTF-8
by perlmonkdr (Beadle) on Oct 30, 2007 at 13:41 UTC

    Yes, sure, below is an extract, the other 3 .. 5 are images, i know that you want to see, the common error are no send properties content-type encoded header, but it's right, in fact, the IE/Opera/FF show the content in correct way

    +++GET 1+++ GET /api/4512268 HTTP/1.0 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, applicati +on/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, a +pplication/x-shockwave-flash, */* Accept-Language: en-us UA-CPU: x86 User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) Host: localhost Cookie: host=A4C779;JSESSIONID=E91C79C1A1A125A16F6586457AFF5C20 Connection: keep-alive +++RESP 1+++ HTTP/1.0 200 OK Content-Type: text/html;charset=UTF-8 Transfer-Encoding: chunked Date: Tue, 30 Oct 2007 13:15:06 GMT Server: Apache-Coyote/1.1 +++GET 3+++ ..... +++CLOSE 5+++ ..... +++CLOSE 1+++

    Well, thk Gangabass

      OK. Can you print results (with headers) to a file and put this file somethere on the web?

        Hi, thank again, i appreciate your concern but the page is correct encoded and i need solve this problem realy fast, the problem is in LWP internal, where? i don't know but i think that reinvent the wheel is faster in this case, the solution was made a new handle connections this is the code:

        #!/usr/bin/perl use open ':utf8'; use Socket; use utf8; $/= "\012" ; my %cookie = (); my $count = 0; sub geturl($) { my $url = shift; my ($host,$path) = ($url =~ m{http://([^/]+)/?(.*)}io); return unless $host; socket(my $sock, AF_INET, SOCK_STREAM, (getprotobyname('tcp'))[2]) + || die $!; if (connect($sock,sockaddr_in(80, inet_aton($host)))) { select((select($sock), $|=1)[0]); #send the request print $sock "GET /${path} HTTP/1.0\015\012". "User-Agent: Opera/9.23 (X11; Linux i686; U; en)\015\012". "Accept-Language: en-US\015\012". "Accept-Charset: utf-8\015\012". "Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjp +eg, image/png, */*\015\012". ((scalar keys %cookie) ? "Cookie: ".join("; ", map {"$_=$cookie{$_}"} keys %cookie) +:""). "\015\012"; my $rin=""; vec($rin, fileno($sock), 1) = 1; #wait an set timeout select($rin, undef, undef, 30) || die $!; my $file = ""; #get content and set timeout check signal for your system eval {local $SIG{ALRM} = sub {die "timeout\n";}; alarm 30; $file.=$_ while read($sock, $_, 10240); alarm 0;}; close $sock; shutdown $sock, 2; die($@) if $@ eq "timeout\n"; #get head and body (my $head,$body) = ($file =~ m{(.+?)^\015\012^(.*)}osm); #set cookie foreach my $hl (split(/\015\012/,$head)) { if ($hl=~m{Set-Cookie[0-9]*:\s+([^=]+)=([^;]*);.*}oi) { #both all ready encode $cookie{$1}=$2; } } #detect status if ($head=~m{^HTTP/[0-9.]{3}\s+30[1-3]\s+.*}io) { #detect redirect if ($head =~ m{^Location:\s+(.+)\015\012}osmi) { #it's must be absolute for rfc but some guy use relati +ve #complere with URI if you think use return geturl($1) if ++$count < 10; } } return $body; } return; } open F,'>:utf8','./save.html'; print F geturl 'http://www.yahoo.com'; close F;

        The cookies must be for the whole process of the script, becouse i need that works just like this

        Thk anyway