Well, that's not clear... It's closer to utf8 than it is to anything else I'm aware of, but the second character you have posted there is a plain-ascii "vertical bar", \x7c, which in combination with the initial A-tilde (\xC3) constitutes an invalid, unusable byte sequence for utf8. That sort of problem would certainly explain the presence of a "?" when you try to convert this to latin1.
I couldn't find a hex tool here now, but I'll look for it.
Sounds like you really need one. All unix/linux systems have "od" (and GNU and others have MSwindows-ported versions); naturally, Perl can be used to provide this facility as well:
(That's a real kluge, but good enough to start with.)@bytes = unpack "C*", $_; # break utf8 string into bytes for ($i=0; $i<@bytes; $i+=8) { $j = ($i+7 < $#bytes) ? $i+7 : $#bytes; print join(" ", map {sprintf "%.2x", $bytes[$_]} $i .. $j), $/; }
If, as seems possible, your DB entries contain corrupted utf8 character data, you'll need to diagnose the problems, patch them, and update the tables as needed -- you should be able to reconstruct the intended characters to replace mangled ones, based on context. Good luck with that.
In reply to Re: Re: Re: Encoding of DBI PostgreSQL output
by graff
in thread Encoding of DBI PostgreSQL output
by Kjetil
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |