in reply to dealing with encoding while converting data from MySQL to Postgres

Besides the above, I know nothing more about the encoding.

Then you need to learn more about the encoding. I've seen enough crazy things (like putting UTF-8 into latin-1 tables) to advise you strongly not to blindly trust a DB, but look at the data that comes out of it, and find out which encoding(s) it uses. Yes, that takes time and effort, but if you don't invest it, you'll have ten times the effort fixing it later on.

# $ iconv -f latin1 -t UTF-8 in.sql > in_utf8.sql # $ mysql < in_utf8.sql

That's a pretty bad idea. As you've even shown us, the mysql dump contains meta information about the character encoding, which you don't change. Assuming that the meta information was accurate before your change, it is now certainly wrong. So, don't do that. Rather use the original database (or recreate it from the dump), and connect to that, and do all encoding conversion with Perl.

bam! error. The `sync.pl` script is a pretty simple

while (my ($col1, $col2... $coln) = $sth->fetch row_array) { ## I believe this is where I should be converting the $col1..n ## values to utf8, but don't know how. insert into Pg }

You haven't shown us the interesting part where you set up DBD::mysql and DBD::Pg to deal with UTF-8. If you haven't done so, please read the documentation of these two modules regarding the handling of encodings in general, and UTF-8 in particular.

As for converting the encoding, Encode can do all that is left to do after you set up the two DBD modules correctly (which could be nothing at all).

while (my ($col1, $col2... $coln) =

I'd simply use while (my @columns = ... here instead, less typing.

Encoding problems should be treated like any other problems in programming. You have to look carefully at the input data, decide what the result should be, and trace the data as it is processed in your program to the point where it deviates from your expectation. Once you found that point, fix it and continue.

See also: Character Encodings in Perl.

Replies are listed 'Best First'.
Re^2: dealing with encoding while converting data from MySQL to Postgres
by punkish (Priest) on Dec 11, 2011 at 04:20 UTC
    Thanks moritz. Adding {mysql_enably_utf8 => 1} and {pg_enable_utf8 => 1} to the db handles for the two databases seems to have solved the problem.

    There were other issues of incompatibly between the two databases that I had to solve, but now I can move data from one to the other.



    when small people start casting long shadows, it is time to go to bed