Re: Re: Re: Encoding of DBI PostgreSQL output

The two characters Ă| imply UTF8, doesn't it?

Well, that's not clear... It's closer to utf8 than it is to anything else I'm aware of, but the second character you have posted there is a plain-ascii "vertical bar", \x7c, which in combination with the initial A-tilde (\xC3) constitutes an invalid, unusable byte sequence for utf8. That sort of problem would certainly explain the presence of a "?" when you try to convert this to latin1.

I couldn't find a hex tool here now, but I'll look for it.

Sounds like you really need one. All unix/linux systems have "od" (and GNU and others have MSwindows-ported versions); naturally, Perl can be used to provide this facility as well:

@bytes = unpack "C*", $_; # break utf8 string into bytes
for ($i=0; $i<@bytes; $i+=8) {
   $j = ($i+7 < $#bytes) ? $i+7 : $#bytes;
   print join(" ", map {sprintf "%.2x", $bytes[$_]} $i .. $j), $/;
}
[download]

(That's a real kluge, but good enough to start with.)

If, as seems possible, your DB entries contain corrupted utf8 character data, you'll need to diagnose the problems, patch them, and update the tables as needed -- you should be able to reconstruct the intended characters to replace mangled ones, based on context. Good luck with that.

Comment on Re: Re: Re: Encoding of DBI PostgreSQL output Select or Download Code