Dear monks,

Many moons ago I wrote a Perl script for data entry into PostgreSQL database over a web GUI. The system ran on Ubuntu server 14.04 and it never had any issues with encoding. What was posted from a web form had encoding that would make it readable from PostgreSQL's shell client.

The system however needs to be upgraded to 18.04 and here's where the issue arose. Now all the output I get from DBD:Pg is in ISO8859. The database itself is in UTF8, the server and client both report UTF8 as encoding. I can run queries interactively or from one shot shell commands and receive the data in correct encoding. But not if used from DBI.

I installed a fresh 18.04 installation to debug the issue and the problem persists. Here's an example of the issue:

First creation of a demo table:

demo=> create table demo (content varchar); CREATE TABLE demo=> \d demo Table "public.demo" Column | Type | Collation | Nullable | Default ---------+-------------------+-----------+----------+--------- content | character varying | | | demo=> insert into demo (content) values ('äöåÄÖÅ'); INSERT 0 1 demo=> insert into demo (content) values ('aaaaaa'); INSERT 0 1 demo=> select * from demo; content --------- äöåÄÖÅ aaaaaa (2 rows)

So far so good. If the last select query is redirected to a text file, examining it with a hex editor reveals it's UTF8, file-command agrees.

Next we try the same with DBI:

use DBI qw(:utils); $DBcon=DBI->connect("DBI:Pg:dbname=demo", "demo", "demo-pass"); $DBhandle=$DBcon->prepare("SELECT * FROM demo"); $DBhandle->execute(); if ($DBI::rows > 0) { print $_->{'content'}." - ".data_string_desc($_->{'content'})."\n" w +hile $_ = $DBhandle->fetchrow_hashref(); } $DBcon->disconnect();
Instead of replicating what came out in PostgreSQL's interactive shell, we get:
������ - UTF8 on, non-ASCII, + 6 characters 12 bytes aaaaaa - UTF8 on, ASCII, 6 characters 6 bytes

If redirected to a text file, hex editor shows that this isn't UTF8 and again file-command agrees. If I insert any data into the database with DBI, it will be entered as double-UTF8'd. The system has PostgreSQL 10, Perl 5.26.1, DBI 1.640 and DBD::pg 3.7.0.

So dear monks, what's going on?


In reply to DBD::pg considers input and output ISO8859 by apz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.