Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re^6: UTF8 issue when getting website via LWP::UserAgent in Perl

by Your Mother (Archbishop)
on May 12, 2016 at 15:26 UTC ( #1162873=note: print w/replies, xml ) Need Help??

in reply to Re^5: UTF8 issue when getting website via LWP::UserAgent in Perl
in thread UTF8 issue when getting website via LWP::UserAgent in Perl

So, I know it's all confusing. Took me forever. But it's actually really simple. A string of bytes is nothing. It's just binary data. You have to know what it's supposed to be and tell your code when coming from binary and going back to it. The raw stuff doesn't know (well, some charsets do have BOM flags but it's not something on which you can rely here). Your DBI/DBD driver can do the encode/decode two-step for you automatically as I suggested (might work even if table definition is wrong but it's best to ensure it's in agreement). :P Examples of the setting to check include–

  • DBD::mysql -> mysql_enable_utf8
    • This attribute determines whether DBD::mysql should assume strings stored in the database are utf8. This feature defaults to off.
  • DBD::SQLite -> sqlite_unicode
    • If the attribute $dbh->{sqlite_unicode} is set, strings coming from the database and passed to the collation function will be properly tagged with the utf8 flag; but this only works if the attribute is set before the first call to a perl collation sequence . The recommended way to activate unicode is to set the sqlite_unicode parameter at connection time
  • DBD::Pg -> pg_enable_utf8 (integer)
    • DBD::Pg specific attribute. The behavior of DBD::Pg with regards to this flag has changed as of version 3.0.0. The default value for this attribute, -1, indicates that the internal Perl utf8 flag will be turned on for all strings coming back from the database if the client_encoding is set to 'UTF8'. Use of this default is highly encouraged. If your code was previously using pg_enable_utf8, you can probably remove mention of it entirely. :\

Update: s/simply/simple/;

Replies are listed 'Best First'.
Re^7: UTF8 issue when getting website via LWP::UserAgent in Perl
by afoken (Chancellor) on May 12, 2016 at 20:50 UTC
Re^7: UTF8 issue when getting website via LWP::UserAgent in Perl
by ultranerds (Hermit) on May 12, 2016 at 15:46 UTC
    Thanks for the info. Man, this is a PITA :S Think I may have to take a break, and come back to it tomorrow.

    There is definitely something up - because even using basic DBI connection, it still messes it up:

    my $dsn = "DBI:mysql:database=$db_cfg->{database};host=$db_cfg->{h +ost};port=3307"; my $dbh = DBI->connect($dsn, $db_cfg->{login}, $db_cfg->{password} +); $dbh->{mysql_enable_utf8} = 1; my $sth = $dbh->prepare( "INSERT INTO ReadingGrabCache SET title = + ?" ); $sth->execute( $title ) or die $DBI::errstr;
    Eugh :/
      Did you know you're not checking the status of the connect or the prepare? And if you turn on RaiseError in the connect, you'll automatically check all DBI methods, and you won't even need the 'or die' on the execute.
        That was just a very quick example I put together, to test the theory about how the data ended up in the table :)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1162873]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (1)
As of 2023-03-25 01:28 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (62 votes). Check out past polls.