The byte stream must also be decoded properly ...

That's the point that rhesa and I were making, and which was absent in the OP code.

... otherwise perl makes assumptions about the input byte stream.

Well, if you want to put it in those terms, you could say "perl assumes that whatever byte stream comes in, that is what will be printed (unless your script specifically applies some other interpretation or conversion, either using Encode or via a PerlIO encoding layer on the output file handle).

leaving a shift-jis encoded byte stream as is, and then expecting the unicode decoding of this stream to work properly is not Ok

I'm not sure what you're talking about here. If you know you have shift-jis data, and you want to convert it to unicode, that's definitely okay, so long as you actually apply some process to do that (perl won't do it "implicitly").

(update: I just remembered something: in case you happen to be running Perl 5.8.0 on a Red-Hat 9 system, then there is a good chance that your defaults include a "locale" setting, which, on that combination of Perl/OS versions, caused Perl to make an implicit ("default") attempt to coerce input/output data between unicode and the encoding implied by the locale. This murdered countless applications and was fixed in later versions of Perl. If this is your situation, it's long past time to upgrade.)

It is clear from the code that this is understood but the wording of this post unnecessarily obfuscates the fact that perl has default settings which are not always appropriate.

Again, this is a bit hard to follow... which code are you referring to here? Which wording is obfuscating? Of course default settings are not always appropriate -- that's why there are alternatives to default settings...

I don't really know why this post turned so negative;

Me neither. That first reply (and its subthread) really threw me. If anything I said seemed negative, I apologize for that -- I generally try to keep my tone neutral, but of course I don't always succeed.

(updated to fix typos)


In reply to Re^3: Encoding Hell by graff
in thread Encoding Hell by kettle

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.