Here is an example code, of where the issue is coming from:
use LWP::UserAgent;
use HTTP::Request::Common qw(GET);
my $ua = LWP::UserAgent->new;
# Define user agent type
$ua->agent('Mozilla/8.0');
# Request object
my $req = GET 'http://www.gazeta.ru/culture/2016/04/22/a_8191769.s
+html';
# Make the request
my $res = $ua->request($req);
binmode STDOUT, ":utf8";
print "Content-Type: text/html; charset=utf-8 \n\n";
use Encode;
if ($res->is_success) {
my $title;
$res->decoded_content =~ /<title>(.+?)<\/title>/ and $title =
+$1;
# prints correctly here!
print "GOT TITLE: $title \n";
$DB->table("ReadingGrabCache")->add( { title => $title, url =>
+ "Foo" });
my $test = $DB->table("ReadingGrabCache")->select ( { url => "
+Foo" })->fetchrow_hashref;
# buggered content here
print "BLA: $test->{title} \n<br>";
} else {
print $res->status_line . "\n";
}
The DB module is encoding insensative (i.e its not doing any kind of conversion) ... so I'm confused how it can be fine here, and then broken when grabbed back :(
Checking it in phpmyAdmin also shows the issue:
Румыния не будет участвовать в «Евровидении-2016» из-за денег - Газета.Ru
The table is quite simple... but maybe I've missed something:
CREATE TABLE IF NOT EXISTS `ReadingGrabCache` (
`grab_id` int(11) NOT NULL AUTO_INCREMENT,
`url` varchar(255) CHARACTER SET latin1 NOT NULL,
`images` text CHARACTER SET latin1 NOT NULL,
`title` text COLLATE utf8_bin NOT NULL,
`description` text COLLATE utf8_bin NOT NULL,
`all_images` longtext CHARACTER SET latin1,
PRIMARY KEY (`grab_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=
+141 ;