Instead of returning a nice string of readable characters, $out (or $res I'm not sure which) returns a string of octets corresponding to the individual bytes for these multibyte characters... I'd like to know: at what point is perl carrying out this conversion process...The point is: perl is not doing any conversion -- it is giving you the "raw" binary byte stream from the source, without doing any kind of "interpretation" of it.
Whatever display tool you are using to view the data as it arrives (and just what are you using to view the data?), it's that tool which is applying the "conversion" (the interpretation of the octet stream) that you find so confusing.
The right track, as indicated by rhesa, is to figure out what character encoding is being used for a given chunk of input content, and use Encode so that perl will apply the correct interpretation to the data, and depending on what sort of display tool you use, convert it to the appropriate character set for viewing. Something like this:
(updated to fix a discrepancy in the variable names).use Encode; ... my $inp_enc = ...; # whatever it happens to be my $out_enc = ':utf8'; # or: my $out_enc = 'encoding(big5)'; # (or whatever your display tool expects) binmode STDOUT, $out_enc; ... print decode( $inp_enc, $res->content ) if ( $res->is_success );
The way that works is: the decode call converts the content to perl-internal utf8 encoding; then, whatever mode was set for STDOUT, the print will automatically do the right thing (or try to) -- converting utf8 to something else if need be -- as the content is written to that file handle.
(Of course, if you want to output a non-unicode encoding because of your display tool, understand that you will get lots of encoding errors, and nothing worth looking at, if you try printing, say, Chinese text when STDOUT is set to, say, cp1251. That's the problem with non-unicode character sets: they tend to be language-specific.)
In reply to Re: Encoding Hell
by graff
in thread Encoding Hell
by kettle
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |