in reply to Re: HTML::Entities and multi-byte characters
in thread HTML::Entities and multi-byte characters

thanks for the tips. It does seem that 5.8 is *much* better at handling unicode strings. doing encode_entities("a \x{9B} \x{263A}") in 5.6 yields:
a › ☺
In 5.8 it yields:
a › ☺
which is what it should be.

However, the string coming from the database (MySQL) still doesn't print correctly. I'm wasn't familiar with the Encode module that you mentioned but when I do a Dump (using Devel::Peek) on the string I pull from the database, I can see that it doesn't have the UTF8 flag that the string I create manually does. I tried doing a:
my $str = decode_utf8($data);
which worked splendidly and did exactly what I wanted it to. Do you know if this is SOP when working with MySQL? (i.e. will I have to do this on any string that I pull from the database?)

Replies are listed 'Best First'.
Re^3: HTML::Entities and multi-byte characters
by iburrell (Chaplain) on Sep 13, 2004 at 22:07 UTC
    You probably will have to make a Unicode string from strings that come from the database.

    Some drivers (DBD::Pg) will flag strings as Unicode. I don't know if DBD::mysql supports this. I have seen three different ways to control the encoding of strings. DBD::Pg has a dbh property, DBD::Oracle uses the NLS_LANG environment variable, and some use the database encoding. Unfortunately, it is not something that is well documented.

      I did a bit of googling and discovered that DBD::mysql doesn't support this but I found there's some ongoing discussion of how it should be emulated: Google Groups Thread. We use our own simple DBH abstraction layer so I might just add functionality at that level to do the decode_utf8() conversion...