I do think the table being latin1 is a
part of the problem. On the other hand, the application that fills the table
- seems to use a reasonable encoding (UTF-8).
- If you change the table to something unicodey, that application most probably will NOT automagically insert a unicode character instead of the current 4 bytes.
Probably the easier solution will be to check for bytes between 0x80 and 0x9F (because these are not defined for ISO 8859-1, the "official" Latin1). If they are not used otherwise in your variant of Latin1, it might be feasible to try it with
Encode::decode.
What happens, if you insert something like
{
use Encode qw(decode :fallbacks);
$text = decode('UTF-8', $text, FB_WARN);
}
after reading $text from the database?