in reply to Malformed UTF-8 character error after fetching data from Postgresql

When I see a statement like this one (from in the OP), I immediately suspect serious problems ahead:

The database server encoding is UTF-8, and the script is in cp1251 (windows-1251). After connecting to the database (using DBI) I set the database client encoding to WIN1251, so to tell the database that I want the output to be in this encoding.

Regardless of what you might believe about the character content of text fields in the database, it seems imprudent to assume that every UTF-8 character stored in a given table is going to have a mapping to CP1251.

If your code is working "perfectly", UTF-8 characters in the database that happen to be non-existent in CP1251 will all be rendered as "?" in the output from your perl script. Maybe that's ok for you, but I wouldn't like it.

I hope the previous replies have been helpful. I know how tricky it can be to handle DB character encodings "the right way" - in fact, I'd be tempted (for expedience) to just take stuff from the DB as-is and do whatever needs to be done via the Encode module in order to turn it into something else.

E.g., maybe stuff from the DB would just look like "raw bytes" when the perl script first sees it, so "decode" that into perl-internal UTF-8; then "encode" that into CP1251 for output or whatever else there is to do.

  • Comment on Re: Malformed UTF-8 character error after fetching data from Postgresql

Replies are listed 'Best First'.
Re^2: Malformed UTF-8 character error after fetching data from Postgresql
by Anonymous Monk on Sep 18, 2014 at 11:01 UTC
    Yeah there will be a problem if someone enter some data in the DB with characters that aren't supported in CP1251. It shouldn't normally happen... For now this solution solves my problem.