comment on

I keep not getting a complete solution -- but how do I turn on UTF-8 I/O, by default, on all streams (including STD(in/ou/err), regardless of environment settings?

I thought I was running UTF-8 (and am), on my terminal and to a file, but got errors of 'wide-character in print' due to it printing a legal UTF-8 char first to my UTF-8 terminal (I used a binmode on STDERR, to :utf8, to force it, and it works fine). Had the same error when the string was echoed to a file, but adding a binmode there created an inexplicable infinite loop in my code (that wasn't there before adding UTF-8 output).

I'm hoping that by uniformly treating all files as UTF-8 from the start, the problem will 'go away' -- either that or I find any inconsistent usages of 'bytes' vs. 'chars' and eliminate them, if that's the issue, but ideally, the problem will be solved by getting my I/O to a known state, consistently.

Instead of printing white space for my indentation in a formatted html output, I used the UTF-8 chars by changing a Readonly constant:

Readonly my $FullPrefix    => "\N{RIGHTWARDS DOUBLE ARROW}" x 2;
Readonly my $MinPrefix    => "\N{RIGHTWARDS DOUBLE ARROW}";
[download]

When it was a space (or 2), had no problems other than not being able to tell what was my inserted spacing vs. spacing in the original source. Thought that changing it to a visibly unique character, would make things easier -- but then ran into the wide-char output (I *thought* I was already in UTF-8 mode for I/O on all streams; AGAIN, I thought wrong). I had a binmode for STDOUT at the beginning, but left it off of STDERR, which is where my debug output is going.

But equally (perhaps more distressing) is the web input I'm getting is in UTF-8, -- but when I add a UTF-8 character to the output stream for storage in a cache file I get the wide character error. Adding binmode on the file handle before I write -- for some reason causes an infinite recursion error as it tries to write out the content (using a routing named 'content' that is the one that's getting called infinitely). Yet writing out spaces and doing no binmode conversion call didn't trigger the recursion error.

So rather than having to do a binmode on each and every open call, I'd rather just tell perl once at the beginning of my program, to only speak UTF-8 on any streams.

It may not solve the recursion problem, but it's a next step to moving that 'binmode' call out of the area where the recursion is triggered.

But it's also the case, that I want to put the 'UTF-8' forcing on all file handles into my default 'program template', since for my own development, my assumption/presumption is that I can always use UTF-8. Any environment where that's broken needs to be fixed -- OR I'll special case a particular program to handle such cases.

For whatever reasons, the specific perls compiled for my particular 'OS's, the environments my perl programs run in (different distribution releases, different versions of env vars, different users, different OS's, etc...), or whatever else figures in, Perl doesn't always get that it should be only speaking UTF-8.

Dakura (therefore), I want to force it to do 'The Right Thing(tm)', and then fix fallout from known 'Right' output.

Is there an 'use :utf8 everywhere' that actually does what I am wanting, that I can include at the top of my progs?

Many thanks and appreciations! ;-)

In reply to How to get UTF-8 perl all the time? by perl-diddler

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.