Here is an example code, of where the issue is coming from:

use LWP::UserAgent; use HTTP::Request::Common qw(GET); my $ua = LWP::UserAgent->new; # Define user agent type $ua->agent('Mozilla/8.0'); # Request object my $req = GET 'http://www.gazeta.ru/culture/2016/04/22/a_8191769.s +html'; # Make the request my $res = $ua->request($req); binmode STDOUT, ":utf8"; print "Content-Type: text/html; charset=utf-8 \n\n"; use Encode; if ($res->is_success) { my $title; $res->decoded_content =~ /<title>(.+?)<\/title>/ and $title = +$1; # prints correctly here! print "GOT TITLE: $title \n"; $DB->table("ReadingGrabCache")->add( { title => $title, url => + "Foo" }); my $test = $DB->table("ReadingGrabCache")->select ( { url => " +Foo" })->fetchrow_hashref; # buggered content here print "BLA: $test->{title} \n<br>"; } else { print $res->status_line . "\n"; }


The DB module is encoding insensative (i.e its not doing any kind of conversion) ... so I'm confused how it can be fine here, and then broken when grabbed back :(

Checking it in phpmyAdmin also shows the issue:

Румыния не будет участвовать в «Евровидении-2016» из-за денег - Газета.Ru

The table is quite simple... but maybe I've missed something:

CREATE TABLE IF NOT EXISTS `ReadingGrabCache` ( `grab_id` int(11) NOT NULL AUTO_INCREMENT, `url` varchar(255) CHARACTER SET latin1 NOT NULL, `images` text CHARACTER SET latin1 NOT NULL, `title` text COLLATE utf8_bin NOT NULL, `description` text COLLATE utf8_bin NOT NULL, `all_images` longtext CHARACTER SET latin1, PRIMARY KEY (`grab_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT= +141 ;

In reply to Re^4: UTF8 issue when getting website via LWP::UserAgent in Perl by ultranerds
in thread UTF8 issue when getting website via LWP::UserAgent in Perl by ultranerds

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.