Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Dear Perl Monks.
I've used perl in the past and know my way around a bit. But now I've encountered something I don't comprehend any more. Either I am doing something wrong or its a bug. I would apreciate some enlightenment on this example.
I am running Debian Jessie. German UTF-8 Locale. The problem seems to be at least similar on Debian Wheezy. I upgraded before writing this test.
This is perl 5, version 20, subversion 2 (v5.20.2) built for i586-linu +x-gnu-thread-multi-64int (with 42 registered patches, see perl -V for more detail)
My code is as follows:
My output here is:#!/usr/bin/perl use DBI; use strict; use warnings; my $dbh; sub onevaluesql($) { my ($command) = @_; my $sth=$dbh->prepare($command); $sth->execute(); my ($value) = $sth->fetchrow_array(); $sth->finish(); return $value; } sub test($$) { my ($testname,$test) = @_; $dbh->do("insert into test(name,test) values (?,?);",undef,$testname +,$test); my $resulttest = onevaluesql("select test from test where name='$testname';"); if($test eq $resulttest) { print "$testname is successful on DBI readback.\n"; } else { print "$testname failed on DBI readback. Got $resulttest where $te +st was expected.\n"; } $dbh->commit(); my $resulttest2 = `echo "select test from test where name='$testname +';" | psql --html`; print "psql raw results: $resulttest2\n"; if($resulttest2 =~ s/<td align="left">([^<]*)</td>/) { $resulttest2=$1; } if($test eq $resulttest2) { print "$testname is successful on psql readback.\n"; } else { print "$testname failed on psql readback. Got $resulttest where $t +est was expected.\n"; } } $dbh = DBI->connect("dbi:Pg:dbname=articles", "", "", {AutoCommit => 0 +, RaiseError => 1}); print "DBI is version $DBI::VERSION, DBD::Pg is version $DBD::Pg::VERS +ION\n"; print "client_encoding=" . onevaluesql("show client_encoding;") . ", s +erver_encoding=" . onevaluesql("show server_encoding;") . "\n"; system "locale"; print "-------------------\n"; $dbh->do("drop table if exists test;"); $dbh->do("create table test ( name text, test text );"); $dbh->commit(); test('test1','ascii'); test('test2','äöüßÄÖÜ'); $dbh->disconnect();
DBI is version 1.631, DBD::Pg is version 3.4.2 client_encoding=UTF8, server_encoding=UTF8 LANG=de_DE.UTF-8 LANGUAGE= LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_COLLATE="de_DE.UTF-8" LC_MONETARY="de_DE.UTF-8" LC_MESSAGES="de_DE.UTF-8" LC_PAPER="de_DE.UTF-8" LC_NAME="de_DE.UTF-8" LC_ADDRESS="de_DE.UTF-8" LC_TELEPHONE="de_DE.UTF-8" LC_MEASUREMENT="de_DE.UTF-8" LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL= ------------------- test1 is successful on DBI readback. psql raw results: <table border="1"> <tr> <th align="center">test</th> </tr> <tr valign="top"> <td align="left">ascii</td> </tr> </table> <p>(1 Zeile)<br /> </p> test1 is successful on psql readback. test2 is successful on DBI readback. psql raw results: <table border="1"> <tr> <th align="center">test</th> </tr> <tr valign="top"> <td align="left">äöüÃÃÃÃ</td> </tr> </table> <p>(1 Zeile)<br /> </p> test2 failed on psql readback. Got äöüßÄÖÜ where äöüßÄÖÜ was expected.
Can you monks tell me where i went wrong? What is happing here? Somehow the String is double utf-8 encoded in the database and perl partly seems to mask that after regexp operations? My goal is to insert utf-8 encoded strings into postgres and the strings should be the same there. In theorie that shouldn't be too hard, i keep failing at that...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Oddities in UTF-8 and DBD::Pg?
by afoken (Chancellor) on Jun 28, 2015 at 07:32 UTC | |
|
Re: Oddities in UTF-8 and DBD::Pg?
by Anonymous Monk on Jun 28, 2015 at 06:18 UTC | |
by Anonymous Monk on Jun 28, 2015 at 07:35 UTC | |
by Anonymous Monk on Jun 28, 2015 at 08:16 UTC | |
|
Re: Oddities in UTF-8 and DBD::Pg?
by Anonymous Monk on Jun 28, 2015 at 09:18 UTC |