When I see a statement like this one (from in the OP), I immediately suspect serious problems ahead:

The database server encoding is UTF-8, and the script is in cp1251 (windows-1251). After connecting to the database (using DBI) I set the database client encoding to WIN1251, so to tell the database that I want the output to be in this encoding.

Regardless of what you might believe about the character content of text fields in the database, it seems imprudent to assume that every UTF-8 character stored in a given table is going to have a mapping to CP1251.

If your code is working "perfectly", UTF-8 characters in the database that happen to be non-existent in CP1251 will all be rendered as "?" in the output from your perl script. Maybe that's ok for you, but I wouldn't like it.

I hope the previous replies have been helpful. I know how tricky it can be to handle DB character encodings "the right way" - in fact, I'd be tempted (for expedience) to just take stuff from the DB as-is and do whatever needs to be done via the Encode module in order to turn it into something else.

E.g., maybe stuff from the DB would just look like "raw bytes" when the perl script first sees it, so "decode" that into perl-internal UTF-8; then "encode" that into CP1251 for output or whatever else there is to do.


In reply to Re: Malformed UTF-8 character error after fetching data from Postgresql by graff
in thread Malformed UTF-8 character error after fetching data from Postgresql by nihiliath

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.