How to get UTF-8 perl all the time?

perl-diddler has asked for the wisdom of the Perl Monks concerning the following question:

I keep not getting a complete solution -- but how do I turn on UTF-8 I/O, by default, on all streams (including STD(in/ou/err), regardless of environment settings?

I thought I was running UTF-8 (and am), on my terminal and to a file, but got errors of 'wide-character in print' due to it printing a legal UTF-8 char first to my UTF-8 terminal (I used a binmode on STDERR, to :utf8, to force it, and it works fine). Had the same error when the string was echoed to a file, but adding a binmode there created an inexplicable infinite loop in my code (that wasn't there before adding UTF-8 output).

I'm hoping that by uniformly treating all files as UTF-8 from the start, the problem will 'go away' -- either that or I find any inconsistent usages of 'bytes' vs. 'chars' and eliminate them, if that's the issue, but ideally, the problem will be solved by getting my I/O to a known state, consistently.

Instead of printing white space for my indentation in a formatted html output, I used the UTF-8 chars by changing a Readonly constant:

Readonly my $FullPrefix    => "\N{RIGHTWARDS DOUBLE ARROW}" x 2;
Readonly my $MinPrefix    => "\N{RIGHTWARDS DOUBLE ARROW}";
[download]

When it was a space (or 2), had no problems other than not being able to tell what was my inserted spacing vs. spacing in the original source. Thought that changing it to a visibly unique character, would make things easier -- but then ran into the wide-char output (I *thought* I was already in UTF-8 mode for I/O on all streams; AGAIN, I thought wrong). I had a binmode for STDOUT at the beginning, but left it off of STDERR, which is where my debug output is going.

But equally (perhaps more distressing) is the web input I'm getting is in UTF-8, -- but when I add a UTF-8 character to the output stream for storage in a cache file I get the wide character error. Adding binmode on the file handle before I write -- for some reason causes an infinite recursion error as it tries to write out the content (using a routing named 'content' that is the one that's getting called infinitely). Yet writing out spaces and doing no binmode conversion call didn't trigger the recursion error.

So rather than having to do a binmode on each and every open call, I'd rather just tell perl once at the beginning of my program, to only speak UTF-8 on any streams.

It may not solve the recursion problem, but it's a next step to moving that 'binmode' call out of the area where the recursion is triggered.

But it's also the case, that I want to put the 'UTF-8' forcing on all file handles into my default 'program template', since for my own development, my assumption/presumption is that I can always use UTF-8. Any environment where that's broken needs to be fixed -- OR I'll special case a particular program to handle such cases.

For whatever reasons, the specific perls compiled for my particular 'OS's, the environments my perl programs run in (different distribution releases, different versions of env vars, different users, different OS's, etc...), or whatever else figures in, Perl doesn't always get that it should be only speaking UTF-8.

Dakura (therefore), I want to force it to do 'The Right Thing(tm)', and then fix fallout from known 'Right' output.

Is there an 'use :utf8 everywhere' that actually does what I am wanting, that I can include at the top of my progs?

Many thanks and appreciations! ;-)

Comment on How to get UTF-8 perl all the time? Download Code

Replies are listed 'Best First'.
Re: How to get UTF-8 perl all the time? by Anonymous Monk on Oct 17, 2010 at 06:19 UTC
Try `use open qw' :std :encoding(UTF-8) '; use open qw' IO :encoding(UTF-8) ';` [download]	[reply] [d/l]
Re^2: How to get UTF-8 perl all the time? by perl-diddler (Chaplain) on Oct 17, 2010 at 07:20 UTC
Thanks...that did it! Now my recursion problem is full time! ;-) Obviously I have my work cut out for me... ;-) Hey, predictable bugs, are much better than unpredictable ones...	[reply]