http://qs1969.pair.com?node_id=1162873


in reply to Re^5: UTF8 issue when getting website via LWP::UserAgent in Perl
in thread UTF8 issue when getting website via LWP::UserAgent in Perl

So, I know it's all confusing. Took me forever. But it's actually really simple. A string of bytes is nothing. It's just binary data. You have to know what it's supposed to be and tell your code when coming from binary and going back to it. The raw stuff doesn't know (well, some charsets do have BOM flags but it's not something on which you can rely here). Your DBI/DBD driver can do the encode/decode two-step for you automatically as I suggested (might work even if table definition is wrong but it's best to ensure it's in agreement). :P Examples of the setting to check include–

Update: s/simply/simple/;

Replies are listed 'Best First'.
Re^7: UTF8 issue when getting website via LWP::UserAgent in Perl
by afoken (Chancellor) on May 12, 2016 at 20:50 UTC
Re^7: UTF8 issue when getting website via LWP::UserAgent in Perl
by ultranerds (Hermit) on May 12, 2016 at 15:46 UTC
    Thanks for the info. Man, this is a PITA :S Think I may have to take a break, and come back to it tomorrow.

    There is definitely something up - because even using basic DBI connection, it still messes it up:

    my $dsn = "DBI:mysql:database=$db_cfg->{database};host=$db_cfg->{h +ost};port=3307"; my $dbh = DBI->connect($dsn, $db_cfg->{login}, $db_cfg->{password} +); $dbh->{mysql_enable_utf8} = 1; my $sth = $dbh->prepare( "INSERT INTO ReadingGrabCache SET title = + ?" ); $sth->execute( $title ) or die $DBI::errstr;
    Eugh :/
      Did you know you're not checking the status of the connect or the prepare? And if you turn on RaiseError in the connect, you'll automatically check all DBI methods, and you won't even need the 'or die' on the execute.
        That was just a very quick example I put together, to test the theory about how the data ended up in the table :)