Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Unicode With LWP

by zer (Deacon)
on Jan 20, 2008 at 00:26 UTC ( #663252=perlquestion: print w/replies, xml ) Need Help??

zer has asked for the wisdom of the Perl Monks concerning the following question:

Good evening, I am running into a wierd bump with some older code of mine. What the following code does is behave as a proxy on a website changing minor things. The problem it runs into is left and right double quotes which appear on certain websites. When it spits them out it isnt the same;
#!/usr/bin/perl use strict; use warnings; use CGI; binmode(STDOUT,":utf8"); #it doesnt print right with it here or not my $t=qq(<img border="0px" src="http://server.com/T.png" alt="T.png" +/>); my $req = CGI->new(); use LWP::Simple; my $content= get($req->param('go')); $content=~s|<div id="jump-to-nav">.*?</div>||g; $content=~s|<div class="printfooter".*|</body></html>|msg; $content=~s|"(/tmpwiki/phase3)|"http://server.com$1|g; $content=~s|<head>.*</head>|<head><meta http-equiv="Content-Type" cont +ent="text/html;charset=utf-8" /><title>BTT</title></head>|msg; $content=~s|(\<img)|$1 border="0px" |msg; $content=~s|BTT400:||; $content=~s|$t.*?$t||msg; if ($req->param("f")){ print "Content-type: application/x-download\nContent-Disposition:attac +hment;filename=\"Business Requirement.doc\"\n\n".$content; }else{print $req->header().$content;}

Replies are listed 'Best First'.
Re: Unicode With LWP
by Juerd (Abbot) on Jan 20, 2008 at 00:36 UTC

    LWP::Simple gives you the content as a byte string, ignoring the charset attribute in the Content-Type header. If you want to pass the data along without decoding it, you will have to use the same charset that your source used, but LWP::Simple didn't provide it.

    You could find it out manually, hard code it, and hope they'll never change it. Or you could hop from LWP::Simple to a more advanced module, like full LWP. My favourite way of doing this is to use decoded_content and then explicitly re-encode as UTF-8 for output, because I like to standardize on UTF-8 for web stuff.

    Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://663252]
Approved by gam3
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2022-08-18 05:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?