in reply to Re^4: A UTF8 round trip with MySQL
in thread A UTF8 round trip with MySQL
The string is stored internally with bytes > 128, but without UTF-8 flag turned on, but Perl still understands this string.
Yes, because it's stored in the default 8bit encoding, probably Latin-1. This is assuming you're not using the utf8 pragma, and your script file really is in the default 8bit encoding.
DBD::mysql does not recognise this as UTF-8 (because missing UTF-8 flag, so accented characters are stripped.No, dbd::mysql will -currently- assume the string is utf-8 anyway, but since it's actually latin-1 the mysql database will (in my experience) truncate the string at the first accented character. In other words, that value in the database will end up as "latin-1 "
utf8::upgrade($string) turns on the flagAnd it converts the string to utf8 first. At that point you're guaranteed that the internal encoding of $string is really utf-8. utf8::upgrade() is a no-op if the string already is flagged as utf-8, so you can always safely use it when your strings are correctly marked.
Would using $string = Encode::decode_utf8($string) also work in this case?No, because the string isn't in utf8 but in the default 8bit encoding.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^6: A UTF8 round trip with MySQL
by clinton (Priest) on Jun 13, 2007 at 11:50 UTC | |
by Joost (Canon) on Jun 13, 2007 at 12:07 UTC |