in reply to why no default unicode?

There are two good reasons. The first is backwards compatibility. Perl tries very hard not to break old programs, and there are a lot of old programs that would be broken by such a change.

The second reason is that as it is now, a program as simple as

while(<>) { print; }

Just works, ie it print out the same data as it reads. If STDOUT defaulted to UTF-8, it would also need to default to UTF-8 for reading operations.

And when that's the default, suddenly reading a non-UTF-8 file will cause either a fatal error, or that the data can't be interpreted correctly.

Replies are listed 'Best First'.
Re^2: why no default unicode?
by perl-diddler (Chaplain) on Mar 19, 2011 at 23:54 UTC
    Ok, I get this...but I copied my 'UTF-8' screen output to a file (inspected it with hexdump -C) and it has the UTF-8 chars in it. When I ran it through your prog, it auto-defaulted to UTF-8!!!

    So then I tried cut/paste directly into perl. Again, the same cut/paste I put into the above file.

    Ran the prog again. Same thing -- no complaint. So now I have it outputting my 'utf-8' characters with no complain, but when I try to do it via perl's unicode facility, it doesn't work.

    Pure guess -- it's interpreting it as a byte stream, so byte in / byte out...perl thinks it's all 'bytes', but the term interprets the input and output as UTF-8 (the term is setup to pass UTF-8 chars through on input as well).

    So basically, if I want to safely use unicode in perl, I need to pre-convert my unicode chars into utf-8 byte-strings, and output them as simple byte strings? ...(yup, that works)...

    I guess I somehow thought that perl would now detect the terminal settings from the local/environment setting and set the unicode-ness of STD(IOER) automatically. Is that something that would be a bad thing for perl to do? I'm sure I'm missing some obvious point(s) somewhere...

      When I ran it through your prog, it auto-defaulted to UTF-8!!!

      It did not, whatever you mean by that.

      Pure guess -- it's interpreting it as a byte stream, so byte in / byte out...perl thinks it's all 'bytes', but the term interprets the input and output as UTF-8 (the term is setup to pass UTF-8 chars through on input as well).

      Exactly.

      I guess I somehow thought that perl would now detect the terminal settings from the local/environment setting and set the unicode-ness of STD(IOER) automatically. Is that something that would be a bad thing for perl to do?

      As I wrote before, it would make it impossible to process binary data (or any non-UTF-8 data) out of the box. People want to do that, independently of whether they are in an UTF-8 console or not.