Hi monks,

I have an legacy application which uses Postgres to store some data and it's database has been created with the charset WIN1252 for legacy reasons. Internally the application should only work with UTF-8 Perlstrings in the meantime, but there may be places were it doesn't. My problem now is that if I insert some textual data with german umlauts into the database, the result is that I get UTF-8 bytes instead of the proper german umlaut.

I debugged the problem and in the application the strings all look as expected, the root cause seems to be that DBD::Pg encodes the data as UTF-8 instead of WIN1252 before transferring it to the database. This makes me wonder because the clinet encoding is properly detected as WIN1252 automatically and the fact that DBD::Pg can encode to valid UTF-8 looks like my strings are in fact valid Perlstrings.

If I change my application to set the client encoding to UTF-8 or manually encode my strings to WIN1252 everything works as expected, in both cases I get valid german umlauts in the database. Both of course work because if I tell the connection it's UTF-8, the server can recode properly to WIN1252 and if I send WIN1252 myself the server won't change anything but store the bytes 1:1.

From my understanding, if DBD::Pg detects a client encoding of WIN1252 automatically it shouldn't encode the data to send to UTF-8, but WIN1252 itself. But obviously I'm wrong because the same problem exists on a Windows host, but I just didn't realize it before because in this case the target database has been created as UTF-8.

Is it expected behavior that DBD::Pg encodes UTF-8 Perlstrings to UTF-8 bytes before sending them to the server, regardless of the (automatically detected) client encoding? Does this mean that I simply need to always set the client encoding to UTF-8 if I'm sure to have valid UTF-8 Perlstrings internally?

Thanks for your wisdom!


In reply to DBD::Pg encodes Perlstring to UTF-8 bytes instead of WIN1252 regardless client encoding by Pickwick

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.