I am at my wits end moving data from MySQL to Postgres (long story).

I inherit the MySQL data dump. I am pretty certain encoding of the original db is all messed up; it is supposed to be utf8, is actually set as latin1, and has data in it that would be appropriate for utf8. Its first few lines are like so

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */ +; /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */; /*!40101 SET NAMES utf8 */; CREATE DATABASE /*!32312 IF NOT EXISTS*/ `dbname` /*!40100 DEFAULT + CHARACTER SET latin1 */; USE `dbname`; DROP TABLE IF EXISTS `foo`; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `foo` ( ..

Besides the above, I know nothing more about the encoding. The new Pg database is utf8, and is in my control. Ideally, I would like to set up my routine so the Pg database is all correctly encoded in utf8. In other words, whatever the MySQL encoding, I want it to end up in Pg as utf8.

There are several complicated data types in the MySQL db, so I am unable to simply dump from MySQL and restore in Pg. I am now trying to simply query one db, and insert the values in the other db using Perl DBI. I keep on getting the error of the type 'invalid byte sequence for encoding "UTF8": 0x93 at...'. Here are my steps:

  1. dump data from MySQL (actually, I inherit the dump, so I have no control over how the dump is actually created)
  2. $ iconv -f latin1 -t UTF-8 in.sql > in_utf8.sql
  3. $ mysql < in_utf8.sql
  4. $ perl sync.pl

bam! error. The `sync.pl` script is a pretty simple

query table in mysql while (my ($col1, $col2... $coln) = $sth->fetch row_array) { ## I believe this is where I should be converting the $col1..n ## values to utf8, but don't know how. insert into Pg }

I have also tried the following variation -- changed step #2 above to sed 's/latin1/utf8/g' in.sql > in_utf8.sql and then loaded that in mysql in step #3 above before trying to insert it in Pg. Still, no joy. Similar utf8 encoding error.

Suggestions?



when small people start casting long shadows, it is time to go to bed

In reply to dealing with encoding while converting data from MySQL to Postgres by punkish

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.