in reply to Re: why no default unicode?
in thread why no default unicode?

Ok, I get this...but I copied my 'UTF-8' screen output to a file (inspected it with hexdump -C) and it has the UTF-8 chars in it. When I ran it through your prog, it auto-defaulted to UTF-8!!!

So then I tried cut/paste directly into perl. Again, the same cut/paste I put into the above file.

Ran the prog again. Same thing -- no complaint. So now I have it outputting my 'utf-8' characters with no complain, but when I try to do it via perl's unicode facility, it doesn't work.

Pure guess -- it's interpreting it as a byte stream, so byte in / byte out...perl thinks it's all 'bytes', but the term interprets the input and output as UTF-8 (the term is setup to pass UTF-8 chars through on input as well).

So basically, if I want to safely use unicode in perl, I need to pre-convert my unicode chars into utf-8 byte-strings, and output them as simple byte strings? ...(yup, that works)...

I guess I somehow thought that perl would now detect the terminal settings from the local/environment setting and set the unicode-ness of STD(IOER) automatically. Is that something that would be a bad thing for perl to do? I'm sure I'm missing some obvious point(s) somewhere...

Replies are listed 'Best First'.
Re^3: why no default unicode?
by moritz (Cardinal) on Mar 20, 2011 at 07:24 UTC
    When I ran it through your prog, it auto-defaulted to UTF-8!!!

    It did not, whatever you mean by that.

    Pure guess -- it's interpreting it as a byte stream, so byte in / byte out...perl thinks it's all 'bytes', but the term interprets the input and output as UTF-8 (the term is setup to pass UTF-8 chars through on input as well).

    Exactly.

    I guess I somehow thought that perl would now detect the terminal settings from the local/environment setting and set the unicode-ness of STD(IOER) automatically. Is that something that would be a bad thing for perl to do?

    As I wrote before, it would make it impossible to process binary data (or any non-UTF-8 data) out of the box. People want to do that, independently of whether they are in an UTF-8 console or not.