in reply to Re: Re: Encoding of DBI PostgreSQL output
in thread Encoding of DBI PostgreSQL output
Well, that's not clear... It's closer to utf8 than it is to anything else I'm aware of, but the second character you have posted there is a plain-ascii "vertical bar", \x7c, which in combination with the initial A-tilde (\xC3) constitutes an invalid, unusable byte sequence for utf8. That sort of problem would certainly explain the presence of a "?" when you try to convert this to latin1.
I couldn't find a hex tool here now, but I'll look for it.
Sounds like you really need one. All unix/linux systems have "od" (and GNU and others have MSwindows-ported versions); naturally, Perl can be used to provide this facility as well:
(That's a real kluge, but good enough to start with.)@bytes = unpack "C*", $_; # break utf8 string into bytes for ($i=0; $i<@bytes; $i+=8) { $j = ($i+7 < $#bytes) ? $i+7 : $#bytes; print join(" ", map {sprintf "%.2x", $bytes[$_]} $i .. $j), $/; }
If, as seems possible, your DB entries contain corrupted utf8 character data, you'll need to diagnose the problems, patch them, and update the tables as needed -- you should be able to reconstruct the intended characters to replace mangled ones, based on context. Good luck with that.
|
|---|