comment on

You said:

Instead of returning a nice string of readable characters, $out (or $res I'm not sure which) returns a string of octets corresponding to the individual bytes for these multibyte characters... I'd like to know: at what point is perl carrying out this conversion process...

The point is: perl is not doing any conversion -- it is giving you the "raw" binary byte stream from the source, without doing any kind of "interpretation" of it.

Whatever display tool you are using to view the data as it arrives (and just what are you using to view the data?), it's that tool which is applying the "conversion" (the interpretation of the octet stream) that you find so confusing.

The right track, as indicated by rhesa, is to figure out what character encoding is being used for a given chunk of input content, and use Encode so that perl will apply the correct interpretation to the data, and depending on what sort of display tool you use, convert it to the appropriate character set for viewing. Something like this:

use Encode;

...

my $inp_enc = ...;  # whatever it happens to be

my $out_enc = ':utf8';
# or: my $out_enc = 'encoding(big5)';
# (or whatever your display tool expects)

binmode STDOUT, $out_enc;

...

print decode( $inp_enc, $res->content ) if ( $res->is_success );
[download]

(updated to fix a discrepancy in the variable names).

The way that works is: the decode call converts the content to perl-internal utf8 encoding; then, whatever mode was set for STDOUT, the print will automatically do the right thing (or try to) -- converting utf8 to something else if need be -- as the content is written to that file handle.

(Of course, if you want to output a non-unicode encoding because of your display tool, understand that you will get lots of encoding errors, and nothing worth looking at, if you try printing, say, Chinese text when STDOUT is set to, say, cp1251. That's the problem with non-unicode character sets: they tend to be language-specific.)

In reply to Re: Encoding Hell by graff
in thread Encoding Hell by kettle

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.