comment on

The two characters Ă| imply UTF8, doesn't it?

Well, that's not clear... It's closer to utf8 than it is to anything else I'm aware of, but the second character you have posted there is a plain-ascii "vertical bar", \x7c, which in combination with the initial A-tilde (\xC3) constitutes an invalid, unusable byte sequence for utf8. That sort of problem would certainly explain the presence of a "?" when you try to convert this to latin1.

I couldn't find a hex tool here now, but I'll look for it.

Sounds like you really need one. All unix/linux systems have "od" (and GNU and others have MSwindows-ported versions); naturally, Perl can be used to provide this facility as well:

@bytes = unpack "C*", $_; # break utf8 string into bytes
for ($i=0; $i<@bytes; $i+=8) {
   $j = ($i+7 < $#bytes) ? $i+7 : $#bytes;
   print join(" ", map {sprintf "%.2x", $bytes[$_]} $i .. $j), $/;
}
[download]

(That's a real kluge, but good enough to start with.)

If, as seems possible, your DB entries contain corrupted utf8 character data, you'll need to diagnose the problems, patch them, and update the tables as needed -- you should be able to reconstruct the intended characters to replace mangled ones, based on context. Good luck with that.

In reply to Re: Re: Re: Encoding of DBI PostgreSQL output by graff
in thread Encoding of DBI PostgreSQL output by Kjetil

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.