in reply to Encoding is a pain.

You seem to have two major problems. One of them is a perl problem, one is not.

The perl problem first: encoding doesn't specify anything about what I/O encoding the script should use. It only specifies what encoding the script itself is in. Err... I was wrong here. encoding does set the encodings of STDIN and STDOUT, but as it says: "Note that STDERR WILL NOT be changed." (under USAGE). I assume the debugging output of XML::Parser goes to STDERR.

Your second problem is, as you've diagnosed, that Shift_JIS is under-specified, and possibly mis-specified as well. So, why is it that your terminal uses Shift_JIS and not utf8?


Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Replies are listed 'Best First'.
Re^2: Encoding is a pain.
by zeimusu (Sexton) on Sep 21, 2004 at 15:22 UTC
    I assume the debugging output of XML::Parser goes to STDERR.

    This is where good assumptions beat rtfm(*). The parser pod file tells us that the Debug style "prints out the document in outline form". Sounds like STDOUT to me, but the source tells a different story.

    Anyway, I now have a Deout.pm. Same as Debug.pm but to STDOUT, and my screen is less filled with garbage. This is good.

    * This, of course, only applies to good assumptions (or dodgy manuals)