Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^4: UTF8 issue when getting website via LWP::UserAgent in Perl

by ultranerds (Hermit)
on May 12, 2016 at 14:58 UTC ( [id://1162866]=note: print w/replies, xml ) Need Help??


in reply to Re^3: UTF8 issue when getting website via LWP::UserAgent in Perl
in thread UTF8 issue when getting website via LWP::UserAgent in Perl

Here is an example code, of where the issue is coming from:

use LWP::UserAgent; use HTTP::Request::Common qw(GET); my $ua = LWP::UserAgent->new; # Define user agent type $ua->agent('Mozilla/8.0'); # Request object my $req = GET 'http://www.gazeta.ru/culture/2016/04/22/a_8191769.s +html'; # Make the request my $res = $ua->request($req); binmode STDOUT, ":utf8"; print "Content-Type: text/html; charset=utf-8 \n\n"; use Encode; if ($res->is_success) { my $title; $res->decoded_content =~ /<title>(.+?)<\/title>/ and $title = +$1; # prints correctly here! print "GOT TITLE: $title \n"; $DB->table("ReadingGrabCache")->add( { title => $title, url => + "Foo" }); my $test = $DB->table("ReadingGrabCache")->select ( { url => " +Foo" })->fetchrow_hashref; # buggered content here print "BLA: $test->{title} \n<br>"; } else { print $res->status_line . "\n"; }


The DB module is encoding insensative (i.e its not doing any kind of conversion) ... so I'm confused how it can be fine here, and then broken when grabbed back :(

Checking it in phpmyAdmin also shows the issue:

Румыния не будет участвовать в «Евровидении-2016» из-за денег - Газета.Ru

The table is quite simple... but maybe I've missed something:

CREATE TABLE IF NOT EXISTS `ReadingGrabCache` ( `grab_id` int(11) NOT NULL AUTO_INCREMENT, `url` varchar(255) CHARACTER SET latin1 NOT NULL, `images` text CHARACTER SET latin1 NOT NULL, `title` text COLLATE utf8_bin NOT NULL, `description` text COLLATE utf8_bin NOT NULL, `all_images` longtext CHARACTER SET latin1, PRIMARY KEY (`grab_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT= +141 ;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1162866]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (3)
As of 2024-03-29 06:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found