in reply to dealing with encoding while converting data from MySQL to Postgres

This is a tough situation, and personally I would go through every row, every text column in the original database, then inspect it with e.g. Encode::is_utf8(), and if it happens to not be valid utf8, try to decode it with Encode::decode("cp1252", $str) ("cp1252" seems to be a good bet for many western languages), and only then insert it to the postgres database. This will probably leave you with some corrupted entries which you can then later figure out how to detect and fix.

Replies are listed 'Best First'.
Re^2: dealing with encoding while converting data from MySQL to Postgres
by Anonymous Monk on Dec 10, 2011 at 12:09 UTC

    ...and even then you may have to watch out for doubly-encoded UTF-8 (maybe the database driver does that, maybe it was inserted to the database that way, maybe ...). You should also look into Text::Iconv for your conversion as it works with raw bytes and does not mind perl's "utf8 bit."